VOOZH about

URL: https://www.analyticsvidhya.com/blog/2023/02/finding-the-best-hotel-based-on-reviews-using-web-scrapping/

⇱ Finding the Best Hotel Based on Reviews Using Web Scrapping


India's Most Futuristic AI Conference Is Back – Bigger, Sharper, Bolder

  • d
  • :
  • h
  • :
  • m
  • :
  • s

Reading list

Finding the Best Hotel Based on Reviews Using Web Scrapping

Sonia Singla Last Updated : 14 Feb, 2023
6 min read

Introduction

Suppose you want to go online shopping and buy the products, and then you get an email from the seller to ask for your review of how the product was. For example, on the Amazon website, I bought the product, and after receiving the product, I got an email asking the question do you have a moment? We’d love to know how everything worked out with you. Kindly review the thing taken recently from the website. The main reason is to be aware of the quality of the product and the customer’s needs.

The same followed with airline services or Google map reviews of recently visited places.

Here is the review given by the guest on the botanical garden in Birmingham.

πŸ‘ "hotel reviews

Similarly to the Botanical garden reviews, we will extract the hotel reviews by scrapping the website of one of the hotels and knowing its sentimental analysis.

Why Hotel Reviews? How to get the recommended hotel or cafe? Being a data scientist, how will one find the best hotel based on reviews?

Choosing a good hotel, cafe, or theater is always a big problem. One looks for the best-recommended hotels, and reviews or comments most matter for the owner and customer. With the help of reviews from hotel staff, managers can improve the quality, so win situations for both.

Detailed data like hotel reviews get collected by scrapping from the websites.

Learning Objectives

  1. Understand the purpose of hotel reviews.
  2. Understand the tools of Web scrapping.
  3. Understand the dissimilarity among various procedures used for scrapping.

This article was published as a part of the Data Science Blogathon.

Table of Contents

What is Web Scrapping?

Data scrapping scraps the data from the internet. The data gets later saved in csv format for further analysis.

But why is it necessary to get a large amount of data from websites?

Web Scrapping leads or boosts a business to step further.

It can compare product prices by collecting data from online shopping websites. Businesses that use email as a marketing medium collect email addresses and send emails. It collects data from Twitter and other social media websites like Facebook. ParseHub is a free tool available.

There are various tools to scrap the data:

1. Beautiful Soup

2. Scrapy

3. Selenium

Understanding the Difference Between Various Scrapping Tools

1. Easier for the new learner: Beginners or new learners who want to learn beautiful soup is the simple library provided.

from bs4 import BeautifulSoup
import urlib.request as req
req = req.Request(url)
res= req.urlopen(req)
soup = BeautifulSoup(res, 'html.parser')
title = soup.find("title").textprint(title)

It is different in Selenium. It uses a chrome driver to extract the contents.

url = "https://www.tripadvisor.co.uk "
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
driver.get(url)
title = driver.find_element(By.TAG_NAME, "title").get_attribute('text')
print(title)

Scrapy Spider uses a file structure-making class to get the desired results.

import scrapy
class TitleSpider(scrapy.Spider): 
name = 'title'
start = [' https://www.tripadvisor.co.uk ']
def parse(self, response): 
 yield { 'name': response.css('title'), }

2. Speed:  Scrapy is faster compared to Beautiful Soup and Selenium. It uses parallelization, which means breaking the problem into smaller ones and solving it one by one.

3. Documentation: The documentation in Beautiful Soup is much better. Selenium and Scrapy have immense material or evidence, but the technical jargon can surprise many novices.

Beautiful Soup

Beautiful Soup is a python library to extract data by sending requests.

To understand the three rules, one should know the following steps:

Suppose you want to meet your friend or colleague, and you come near the house and rung the bell to get permission same is the Hypertext transfer protocol request for knocking on the website to gain access and open the contents.

First is the connection of the hyper-transfer protocol to the website.
Second, we use beautiful soup to parse the texts.

The request yields HTML or XML contents. It consists of the heading tags, H2, slightly lower than H1, the division tag Div, H1, and span contents to mark up a text or small report.

The third step is to store the data locally.

After the extraction, the data gets stored in a structured format of csv.

user_agent='Chrome/108.0.0.0'
headers={'User-Agent':user_agent,}
url = "https://www.booking.com/reviews/gb/hotel/ibis-edinburgh-centre-st-andrew-square.en-gb.html?aid=357028;label=bin859jc-1DEgdyZXZpZXdzKIICOOgHSAlYA2hQiAEBmAEJuAEXyAEM2AED6AEB-AECiAIBqAIDuALorsuNBsACAdICJGJmMjRmNDIwLTMyMTQtNDVjZS05MDNmLTlhY2NjOWM1MTQ0ZdgCBOACAQ;sid=f12de9e50dc512b87c50bcfe5e61d99b;customer_type=total;hp_nav=0;old_page=0;order=featuredreviews;page=4;r_lang=en;rows=75&"
req = req.Request(url,None,headers)
res = req.urlopen(req)
from bs4 import BeautifulSoup 
html=urlopen(res)
bs= BeautifulSoup(html,'lxml')
bs
base_url = "https://www.booking.com/reviews/gb/hotel/ibis-edinburgh-centre-st-andrew-square.en-gb.html?aid=357028;label=bin859jc-1DEgdyZXZpZXdzKIICOOgHSAlYA2hQiAEBmAEJuAEXyAEM2AED6AEB-AECiAIBqAIDuALorsuNBsACAdICJGJmMjRmNDIwLTMyMTQtNDVjZS05MDNmLTlhY2NjOWM1MTQ0ZdgCBOACAQ;sid=f12de9e50dc512b87c50bcfe5e61d99b;customer_type=total;hp_nav=0;old_page=0;order=featuredreviews;page=1"
url_l = ["{}{};r_lang=en;rows=75&".format(base_url, str(page)) for page in range(1,25)]
s=[]
for ul in url_l:
 print (ul)
 s.append(ul)
data = []
data1= []
data2=[]

for pg in s:
 page = req.urlopen(pg)
 try:
 search_response = req.urlopen(pg)
 except req.HTTPError:
 pass
 
 soup = BeautifulSoup(page, 'html.parser')
 ls2= [x.get_text(strip=True) for x in soup.find_all("div", {"class": "review_item_review_content"})]
 ls3= [x.get_text(strip=True) for x in soup.find_all("p", {"class": "review_staydate"})] 
 ls4= [x.get_text(strip=True) for x in soup.find_all("p", {"class": "reviewer_name"})]
 data.append((ls2))
 data1.append(ls3)
 data2.append(ls4)
f = list(itertools.chain(*data))
f1 = list(itertools.chain(*data1))
f2= list(itertools.chain(*data2))
df=p.DataFrame(f1,columns=['Date'])
df['Content']=f
df['Reviewer-Name']=f
πŸ‘ web scrapping
df.to_csv('HotelReviews.csv', index=False, header=True)

Scrapy

Scrapy is an open python building for having a machine wanted or desired for extraction and storing the data fetch.

1. pip install scrapy

2. scrapy start project myfirstscrapy1

3. In the Spider directory, write in notepad.

πŸ‘ web scrapping

We will, first of all, import the module scrapy.

import scrapy

We will then make a class Crawling, send a request, and then get the desired results.

class CrawlingSpider(scrapy.Spider):
 name = "crawling"
 def strequests(self):
 ul = [
 'https://www.trivago.in/en-IN/lm/hotels-edinburgh-united-kingdom?search=101-2;101-5;101-53;101-6;101-9;200-20533',
 'https://www.trivago.in/en-IN/lm/hotels-edinburgh-united-kingdom?search=101-3;101-5;101-53;101-6;101-9;200-20533', 
 ]
 for url in ul:
 yield scrapy.Request(url=ul, callback=self.parse)
 def parse(self, response):
 page = response.url.split("/")[-2]
 filename = 'page-%s.html' % page
scrapy crawl crawling
πŸ‘ web scrapping

Selenium

Selenium is a simple, easy-to-use tool that lets you test system applications. Selenium is an open source that came into existence in 2004 by Jason Huggins. On the test of the applications, he realized that the browser was not making much productivity, so he developed the Javascript program to automate the browsers. Later that JavaScript Test Runner was named Selenium Core. Along with Selenium Core, an application tested on the browser gets installed to have the same field.

For example, the application tested on google cant run on yahoo or other locations as it belongs to a particular spot. As a result, both get installed to have the same field. Web driver is now a modern approach used instead of JavaScript and automates the browser.

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait

from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
val = input("Enter a url: ")
#wait = WebDriverWait(driver, 10)
driver.get(val)
get_url = driver.current_url
if get_url == val:
 header=driver.find_element(By.TAG_NAME, 'div')
 
 print(header.text)

Sentiment Analysis

Sentiment analysis is the process of knowing if a piece of text is positive, neutral, or opposing. Sentiment analysis is the contextual mining of words that reveals a brand’s views and lets the company know the quality of the product produced will be in demand in the market. Emotions (happy, sad, angry, etc.) and polarity (positive, negative, and neutral) are the focus.

πŸ‘ sentiment analysis

It makes Company get a response to improve the services and helps to grow your business.

Natural language processing (NLP) determines positive, negative, or neutral. We can look out or search for sadness and happiness in data.

from nltk.sentiment.vader import SentimentIntensityAnalyzer
import nltk
nltk.download('vader_lexicon')
sentiments = SentimentIntensityAnalyzer()
s= pd.read_csv("HotelReviews.csv")
print(s.head())
reviews = s["Reviewer-Name"].value_counts()
numbers = reviews[:10].index
quantity = reviews[:10].values
custom_colors = ["skyblue", "yellowgreen", 'tomato', "blue", "red"]

plt.title("Hotel Reviewers Name", fontsize=20)
plt.show()
πŸ‘ sentiment analysis
sentiments = SentimentIntensityAnalyzer()
s["Positive"] = [sentiments.polarity_scores(i)["pos"] for i in s["Reviewer-Name"]]
s["Negative"] = [sentiments.polarity_scores(i)["neg"] for i in s["Reviewer-Name"]]
s["Neutral"] = [sentiments.polarity_scores(i)["neu"] for i in s["Reviewer-Name"]]
print(s.head())
πŸ‘ "web scrapping
Positive = sum(s["Positive"])
Negative = sum(s["Negative"])
Neutral = sum(s["Neutral"])
def sent_score(a, b, c):
 if (Positive>Negative) and (Positive>Neutral):
 print("Positive 😊 ")
 elif (Negative>Positive) and (Negative>Neutral):
 print("Negative 😠 ")
 else:
 print("Neutral πŸ™‚ ")
sent_score(Positive, Negative, Neutral)

Neutral πŸ™‚

Conclusion

Reviews make the finding of hotels and cafes by giving reviews and sending critiques which helps in choosing good hostels based on the reviews or comments left by the users, and web scrapping is one of the best ways to do it.

Key Points

1. We used Beautiful Soup to extract data and discuss other tools. Beautiful soup is a good option for new learners or starters. Python makes it simple to begin. ​​​​​​​​​​​​

2. If you want to scrape a website using JavaScript before extracting the data, Selenium is probably your safest option.

3. Whether you want to write a small crawler or a large scraper that repeatedly searches the internet for updated data, Scrapy is the best option.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.

I have done my Master Of Science in Biotechnology and Master of Science in Bioinformatics from reputed Universities. I have written a few research papers, reviewed them, and am currently an Advisory Editorial Board Member at IJPBS.
I Look forward to the opportunities in IT to utilize my skills gained during work and Internship.
https://aster28.github.io/SoniaSinglaBio/site/

Login to continue reading and enjoy expert-curated content.

Free Courses

Generative AI - A Way of Life

Explore Generative AI for beginners: create text and images, use top AI tools, learn practical skills, and ethics.

Getting Started with Large Language Models

Master Large Language Models (LLMs) with this course, offering clear guidance in NLP and model training made simple.

Building LLM Applications using Prompt Engineering

This free course guides you on building LLM apps, mastering prompt engineering, and developing chatbots with enterprise data.

Improving Real World RAG Systems: Key Challenges & Practical Solutions

Explore practical solutions, advanced retrieval strategies, and agentic RAG systems to improve context, relevance, and accuracy in AI-driven applications.

Microsoft Excel: Formulas & Functions

Master MS Excel for data analysis with key formulas, functions, and LookUp tools in this comprehensive course.

Responses From Readers

Flagship Programs

GenAI Pinnacle Program| GenAI Pinnacle Plus Program| AI/ML BlackBelt Program| Agentic AI Pioneer Program

Free Courses

Generative AI| DeepSeek| OpenAI Agent SDK| LLM Applications using Prompt Engineering| DeepSeek from Scratch| Stability.AI| SSM & MAMBA| RAG Systems using LlamaIndex| Building LLMs for Code| Python| Microsoft Excel| Machine Learning| Deep Learning| Mastering Multimodal RAG| Introduction to Transformer Model| Bagging & Boosting| Loan Prediction| Time Series Forecasting| Tableau| Business Analytics| Vibe Coding in Windsurf| Model Deployment using FastAPI| Building Data Analyst AI Agent| Getting started with OpenAI o3-mini| Introduction to Transformers and Attention Mechanisms

Popular Categories

AI Agents| Generative AI| Prompt Engineering| Generative AI Application| News| Technical Guides| AI Tools| Interview Preparation| Research Papers| Success Stories| Quiz| Use Cases| Listicles

Generative AI Tools and Techniques

GANs| VAEs| Transformers| StyleGAN| Pix2Pix| Autoencoders| GPT| BERT| Word2Vec| LSTM| Attention Mechanisms| Diffusion Models| LLMs| SLMs| Encoder Decoder Models| Prompt Engineering| LangChain| LlamaIndex| RAG| Fine-tuning| LangChain AI Agent| Multimodal Models| RNNs| DCGAN| ProGAN| Text-to-Image Models| DDPM| Document Question Answering| Imagen| T5 (Text-to-Text Transfer Transformer)| Seq2seq Models| WaveNet| Attention Is All You Need (Transformer Architecture) | WindSurf| Cursor

Popular GenAI Models

Llama 4| Llama 3.1| GPT 4.5| GPT 4.1| GPT 4o| o3-mini| Sora| DeepSeek R1| DeepSeek V3| Janus Pro| Veo 2| Gemini 2.5 Pro| Gemini 2.0| Gemma 3| Claude Sonnet 3.7| Claude 3.5 Sonnet| Phi 4| Phi 3.5| Mistral Small 3.1| Mistral NeMo| Mistral-7b| Bedrock| Vertex AI| Qwen QwQ 32B| Qwen 2| Qwen 2.5 VL| Qwen Chat| Grok 3

AI Development Frameworks

n8n| LangChain| Agent SDK| A2A by Google| SmolAgents| LangGraph| CrewAI| Agno| LangFlow| AutoGen| LlamaIndex| Swarm| AutoGPT

Data Science Tools and Techniques

Python| R| SQL| Jupyter Notebooks| TensorFlow| Scikit-learn| PyTorch| Tableau| Apache Spark| Matplotlib| Seaborn| Pandas| Hadoop| Docker| Git| Keras| Apache Kafka| AWS| NLP| Random Forest| Computer Vision| Data Visualization| Data Exploration| Big Data| Common Machine Learning Algorithms| Machine Learning| Google Data Science Agent
πŸ‘ Av Logo White

Continue your learning for FREE

Forgot your password?
πŸ‘ Av Logo White

Enter OTP sent to

Edit

Wrong OTP.

Enter the OTP

Resend OTP

Resend OTP in 45s

πŸ‘ Popup Banner
πŸ‘ AI Popup Banner