מה צריך לדעת לפני
שתתחיל
מתחיל 4 June 2026 06:39
נגמר 4 June 2026
00
ימים
00
שעות
00
דקות
00
שניות
1 hour
שדרוג אופציונלי זמין
בינוני
התקדמות בקצב שלך
Free Certificate
שדרוג אופציונלי זמין
סקירה כללית
This course covers tokenization techniques used in modern AI models, including rule-based methods, subword tokenization (BPE, WordPiece, SentencePiece), and vocabulary optimizations. Learners will implement these methods and understand their impact on NLP model performance.
סילבוס
- Unit 1: Introduction to Tokenization (Rule-Based Tokenization)
- Unit 2: Byte-Pair Encoding (BPE) – Subword Tokenization
- Unit 3: Comparing BPE, WordPiece, and SentencePiece in NLP
- Unit 4: Tokenization and Out-of-Vocabulary (OOV) Handling in NLP
Tokenize Text with NLTK
Sentence Tokenization with NLTK
Extract Monetary Values with Regex
Tokenization Showdown with NLTK and spaCy
Exploring Pre-trained Tokenizers with GPT-2
Using Pre-trained Tokenizers with RoBERTa
Comparing Tokenization with GPT-2 and RoBERTa
WordPiece Tokenization Challenge
Tokenization Techniques in Action
Tokenization Techniques in Action
Tokenization Techniques for Special Texts
Tokenization Showdown BERT vs GPT2
Multilingual Tokenization Challenge
Multilingual Tokenization and OOV Reduction
נושאים
Computer Science