Generative AI and LLMs: Architecture and Data Preparation
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Generative AI and LLMs: Architecture and Data Preparation
This course is part of multiple programs.
55,285 already enrolled
Included with
Learn more
Ask Coursera
441 reviews
Recommended experience
441 reviews
Recommended experience
What you'll learn
Differentiate between generative AI architectures and models, such as RNNs, transformers, VAEs, GANs, and diffusion models
Describe how LLMs, such as GPT, BERT, BART, and T5, are applied in natural language processing tasks
Implement tokenization to preprocess raw text using NLP libraries like NLTK, spaCy, BertTokenizer, and XLNetTokenizer
Create an NLP data loader in PyTorch that handles tokenization, numericalization, and padding for text datasets
Skills you'll gain
Details to know
4 assignments
See how employees at top companies are mastering in-demand skills
Build your subject-matter expertise
- Learn new concepts from industry experts
- Gain a foundational understanding of a subject or tool
- Develop job-relevant skills with hands-on projects
- Earn a shareable career certificate
There are 2 modules in this course
Ready to explore the exciting world of generative AI and large language models (LLMs)? This IBM course, part of the Generative AI Engineering Essentials with LLMs Professional Certificate, gives you practical skills to harness AI to transform industries.
Designed for data scientists, ML engineers, and AI enthusiasts, youโll learn to differentiate between various generative AI architectures and models, such as recurrent neural networks (RNNs), transformers, generative adversarial networks (GANs), variational autoencoders (VAEs), and diffusion models. Youโll also discover how LLMs, such as generative pretrained transformers (GPT) and bidirectional encoder representations from transformers (BERT), power real-world language tasks. Get hands-on with tokenization techniques using NLTK, spaCy, and Hugging Face, and build efficient data pipelines with PyTorch data loaders to prepare models for training. A basic understanding of Python, PyTorch, and familiarity with machine learning and neural networks are helpful but not mandatory. Enroll today and get ready to launch your journey into generative AI!
In this module, you will learn about the significance of generative AI and how it is transforming various fields through content generation, code creation, and image synthesis. You will explore key generative AI architectures, such as generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, and transformers, and understand the differences in their training approaches. Youโll also examine how large language models (LLMs) like generative pretrained transformers (GPT) and bidirectional encoder representations from transformers (BERT) are applied in building NLP-based applications. Finally, through a hands-on lab, you will create a simple chatbot using the Hugging Face transformers library and get introduced to essential tools and libraries used in generative AI development.
What's included
5 videos3 readings2 assignments1 app item3 plugins
5 videosโขTotal 28 minutes
- Overview of AI Engineering with LLMsโข6 minutes
- Course Introductionโข3 minutes
- Significance of Generative AI โข6 minutes
- Generative AI Architectures and Models โข6 minutes
- Generative AI for NLPโข7 minutes
3 readingsโขTotal 15 minutes
- IBM Product Spotlight: watsonx.governanceโข2 minutes
- Course Overviewโข10 minutes
- Summary and Highlightsโข3 minutes
2 assignmentsโขTotal 25 minutes
- Graded Quiz: Generative AI Architectureโข15 minutes
- Practice Quiz: Generative AI Overview and Architectureโข10 minutes
1 app itemโขTotal 60 minutes
- Lab: Exploring Generative AI Librariesโข60 minutes
3 pluginsโขTotal 32 minutes
- Helpful Tips for Course Completionโข2 minutes
- Reading: Basics of AI Hallucinationsโข10 minutes
- Reading: Overview of Libraries and Toolsโข20 minutes
In this module, you will learn how to prepare data for training large language models (LLMs) by implementing tokenization and building data loaders. You will explore different tokenization methods and understand how tokenizers convert raw text into model-ready input. You will implement tokenization using libraries such as NLTK, spaCy, BertTokenizer, and XLNetTokenizer. Additionally, you will learn the role of data loaders in the training pipeline and use the DataLoader class in PyTorch to create a data loader with a custom collate function that processes batches of text. These practical skills are essential for building efficient NLP pipelines for LLM training. In addition, supporting materials, such as a cheat sheet and glossary, will reinforce your learning.
What's included
2 videos6 readings2 assignments2 app items2 plugins
2 videosโขTotal 14 minutes
- Tokenizationโข7 minutes
- Overview of Data Loadersโข7 minutes
6 readingsโขTotal 14 minutes
- Data Quality and Diversity for Effective LLM Training โข5 minutes
- Summary and Highlightsโข2 minutes
- What's Next: Explore IBM watsonx.governanceโข1 minute
- Course Conclusionโข3 minutes
- Congratulations and Next Stepsโข2 minutes
- Team and Acknowledgmentsโข1 minute
2 assignmentsโขTotal 25 minutes
- Graded Quiz: Data Preparation for LLMsโข15 minutes
- Practice Quiz: Preparing Dataโข10 minutes
2 app itemsโขTotal 120 minutes
- Lab: Implementing Tokenizationโข60 minutes
- Lab: Creating an NLP Data Loaderโข60 minutes
2 pluginsโขTotal 9 minutes
- Cheat Sheet: Generative AI and LLMs: Architecture and Data Preparationโข5 minutes
- Course Glossary: Generative AI and LLMs: Architecture and Data Preparationโข4 minutes
Earn a career certificate
Add this credential to your LinkedIn profile, resume, or CV. Share it on social media and in your performance review.
Instructors
Offered by
Explore more from Machine Learning
- Status: Free Trial
Course
- Status: Free Trial
- Status: Free Trial
Course
- Status: Free Trial
Specialization
Why people choose Coursera for their career
Learner reviews
- 5 stars
77.65%
- 4 stars
14.89%
- 3 stars
4.74%
- 2 stars
1.58%
- 1 star
1.12%
Showing 3 of 441
Reviewed on Mar 24, 2025
Too fast reading of the slides without much of explanations.
Reviewed on Mar 2, 2025
I love the structure and the content in this course. I can't wait applying the skills I have acquired!
Reviewed on Feb 28, 2025
Was waiting for a course like this for a long time. Very happy with it. Library installation on labs seems a bit slow
Frequently asked questions
It will take only two weeks to complete this course if you spend two hours of study time per week.
It will be good if you have a basic knowledge of Python and PyTorch and a familiarity with machine learning and neural network concepts.
This course is part of a specialization. When you complete the specialization, you will prepare yourself with the skills and confidence to take on jobs such as AI Engineer, NLP Engineer, Machine Learning Engineer, Deep Learning Engineer, and Data Scientist.
More questions
Financial aid available,
