Wat je moet weten voordat je
begint
Start 4 June 2026 07:56
Einde 4 June 2026
00
Dagen
00
Uren
00
Minuten
00
Seconden
1 hour
Optionele upgrade beschikbaar
Gemiddeld
Ga in je eigen tempo vooruit
Free Certificate
Optionele upgrade beschikbaar
Overzicht
This course covers tokenization techniques used in modern AI models, including rule-based methods, subword tokenization (BPE, WordPiece, SentencePiece), and vocabulary optimizations. Learners will implement these methods and understand their impact on NLP model performance.
Lesprogramma
- Unit 1: Introduction to Tokenization (Rule-Based Tokenization)
- Unit 2: Byte-Pair Encoding (BPE) – Subword Tokenization
- Unit 3: Comparing BPE, WordPiece, and SentencePiece in NLP
- Unit 4: Tokenization and Out-of-Vocabulary (OOV) Handling in NLP
Tokenize Text with NLTK
Sentence Tokenization with NLTK
Extract Monetary Values with Regex
Tokenization Showdown with NLTK and spaCy
Exploring Pre-trained Tokenizers with GPT-2
Using Pre-trained Tokenizers with RoBERTa
Comparing Tokenization with GPT-2 and RoBERTa
WordPiece Tokenization Challenge
Tokenization Techniques in Action
Tokenization Techniques in Action
Tokenization Techniques for Special Texts
Tokenization Showdown BERT vs GPT2
Multilingual Tokenization Challenge
Multilingual Tokenization and OOV Reduction
Vakgebieden
Computer Science