Was Sie vorher wissen sollten
bevor Sie beginnen
Beginnt 4 June 2026 06:39
Endet 4 June 2026
00
Tage
00
Stunden
00
Minuten
00
Sekunden
1 hour
Optionales Upgrade verfügbar
Mittelstufe
Lernen Sie in Ihrem eigenen Tempo
Free Certificate
Optionales Upgrade verfügbar
Übersicht
This course covers tokenization techniques used in modern AI models, including rule-based methods, subword tokenization (BPE, WordPiece, SentencePiece), and vocabulary optimizations. Learners will implement these methods and understand their impact on NLP model performance.
Lehrplan
- Unit 1: Introduction to Tokenization (Rule-Based Tokenization)
- Unit 2: Byte-Pair Encoding (BPE) – Subword Tokenization
- Unit 3: Comparing BPE, WordPiece, and SentencePiece in NLP
- Unit 4: Tokenization and Out-of-Vocabulary (OOV) Handling in NLP
Tokenize Text with NLTK
Sentence Tokenization with NLTK
Extract Monetary Values with Regex
Tokenization Showdown with NLTK and spaCy
Exploring Pre-trained Tokenizers with GPT-2
Using Pre-trained Tokenizers with RoBERTa
Comparing Tokenization with GPT-2 and RoBERTa
WordPiece Tokenization Challenge
Tokenization Techniques in Action
Tokenization Techniques in Action
Tokenization Techniques for Special Texts
Tokenization Showdown BERT vs GPT2
Multilingual Tokenization Challenge
Multilingual Tokenization and OOV Reduction
Fachgebiete
Computer Science