![]() |
VOOZH | about |
This article was published as a part of the Data Science Blogathon.
This article will support data scientists in furthering their studies on recommendation systems so that they can develop applications for professional use. We introduce the content-based filtering, for the recommendation system, using this filtering, we learn here how to use this system and how to predict items, we use an amazon dataset.
In recommendation systems, we have two techniques, In this bog we major focus on content-based filtering.
Today in real-world recommendation systems are an integral part of our lives. In amazon Roughly 35% of revenue is made by a Recommendation system, hence we can say the Recommendation system contributes to the major chunk of revenue in amazon. Working on recommendation algorithms is one of my favourite things to do. When I come across a recommendation engine on a website, I immediately want to dissect it and, how it works. It’s one of the many perks of a data scientist!
In this filtering, we use user and item reviews and then using this review we find a common user who has the same interest-as other users.
Content-based filtering we recommend to what the user likes, based on their interest.
Source: Wikipedia
data.columns # prints column-names or feature-names.
data = data[['asin', 'brand', 'color', 'medium_image_url', 'product_type_name', 'title', 'formatted_price']]
print ('Number of data points : ', data.shape[0],
'Number of features:', data.shape[1])
data.head() # prints the top rows in the table.
Source: Author’s GitHub Profile
Remove the same Image.
.
Source: Author’s GitHub Profile
# we use the list of stop words that are downloaded from nltk lib.
import nltk
nltk.download('stopwords')
stop_words = set(stopwords.words('english'))
print ('list of stop words:', stop_words)
{“couldn’t”, ‘such’, ‘where’, ‘too’, ‘are’, ‘ve’, ‘your’, ‘him’, ‘this’, “wouldn’t”, “didn’t”, ‘has’, ‘than’, ‘ll’, ‘very’, ‘who’, ‘having’, ‘for’, “should’ve”, ‘mightn’, ‘of’, ‘until’, ‘we’, ‘haven’, “you’d”, ‘while’, “shouldn’t”, ‘doing’, “mightn’t”, ‘just’, ‘through’, ‘own’, ‘o’, ‘what’, ‘any’, ‘will’, “weren’t”, ‘have’, “hadn’t”, ‘my’, ‘weren’, ‘most’, “aren’t”, ‘it’, ‘had’, ‘further’, ‘more’, ‘those’, ‘on’, ‘against’, “doesn’t”, ‘himself’, ‘their’, ‘few’, ‘being’, ‘you’, ‘below’, ‘in’, ‘here’, ‘be’, “mustn’t”, “wasn’t”, ‘nor’, ‘then’, ‘how’, “that’ll”, ‘a’, ‘hasn’, ‘mustn’, “needn’t”, ‘shouldn’, ‘by’, ‘doesn’, ‘hadn’, ‘y’, ‘herself’, “she’s”, ‘shan’, ‘do’, ‘d’, ‘an’, ‘ourselves’, ‘the’, ‘that’, ‘after’, ‘there’, “you’re”, ‘them’, ‘was’, ‘itself’, ‘hers’, ‘yours’, ‘needn’, ‘down’, ‘its’, “you’ll”, ‘didn’, “won’t”, ‘both’, ‘these’, ‘up’, ‘again’, ‘his’, ‘did’, ‘our’, ‘when’, ‘only’, ‘s’, ‘over’, ‘because’, ‘wasn’, ‘should’, ‘so’, ‘re’, ‘couldn’, ‘under’, ‘ain’, ‘at’, “it’s”, ‘as’, ‘he’, ‘all’, ‘does’, “don’t”, ‘won’, ‘whom’, ‘to’, ‘i’, “haven’t”, ‘ma’, ‘were’, “hasn’t”, ‘m’, ‘above’, ‘each’, ‘she’, “isn’t”, ‘between’, ‘they’, ‘am’, ‘no’, ‘myself’, ‘yourself’, ‘during’, ‘t’, ‘out’, ‘off’, ‘wouldn’, “you’ve”, ‘or’, ‘with’, ‘ours’, ‘before’, ‘same’, ‘which’, ‘into’, ‘now’, “shan’t”, ‘if’, ‘themselves’, ‘isn’, ‘about’, ‘yourselves’, ‘theirs’, ‘and’, ‘don’, ‘not’, ‘from’, ‘can’, ‘me’, ‘but’, ‘is’, ‘once’, ‘why’, ‘some’, ‘her’, ‘aren’, ‘been’, ‘other’}
from nltk.stem.porter import *
stemmer = PorterStemmer()
print(stemmer.stem('arguing'))
print(stemmer.stem('fishing'))
Output.
argu fish
Here we use a TF-IDF to convert a text to a vector and after this, we got a vector for each title.
Source: Towards Data Science
Now we have a vector and for this find, similarity we use a Euclidean distance, which product dist is very small to the query product we can defined-as a similar product.
Source: Tutorial Example
Source: Author’s GitHub Profile
Source: GitHub Profile
Here we can see this is more focused on colour and brand.
Source: https://neurohive.io/en/popular-networks/vgg16/
The output of the VGG16 model.
So here we provide 5 solutions for finding a similar product, we can perform A/B testing.
For more about A/B testing. https://en.wikipedia.org/wiki/A/B_testing
For full code:- https://github.com/shivambaldha/Amazon-Apparel-Recommendations
Recommendation systems are a powerful new tool for adding value to a company and, These systems assist users in locating things they wish to purchase from a business. Recommendation systems are quickly becoming a critical element in online E-commerce.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
•I have completed many Machine Learning and Deep Learning projects in the last 1 year studying with Applied Ai, and I got
hands-on experience in building Machine learning models, and I learned how to tackle any problem and how to represent any
problem to an ML problem.
•Data Scientist/ML-Engineer with strong math and computer science background, have practical Experience in deploying and
Making Predictive Models, implementing data processing, and Machine Learning Algorithms to solve challenging business
problems.
•Also, have Hands-on Model-Building Skills for deep learning techniques with practical experience with TensorFlow/Keras
Library and training models-API using custom data.
GPT-4 vs. Llama 3.1 – Which Model is Better?
Llama-3.1-Storm-8B: The 8B LLM Powerhouse Surpa...
A Comprehensive Guide to Building Agentic RAG S...
Top 10 Machine Learning Algorithms in 2026
45 Questions to Test a Data Scientist on Basics...
90+ Python Interview Questions and Answers (202...
8 Easy Ways to Access ChatGPT for Free
Prompt Engineering: Definition, Examples, Tips ...
What is LangChain?
What is Retrieval-Augmented Generation (RAG)?
[…] To more about a recommendation system and content-based filtering click here. […]
Edit
Resend OTP
Resend OTP in 45s