מה צריך לדעת לפני
שתתחיל

מתחיל 4 June 2026 06:39

נגמר 4 June 2026

00 ימים
00 שעות
00 דקות
00 שניות
course image

Modern Tokenization Techniques for AI & LLMs

Master modern tokenization methods including BPE, WordPiece, and SentencePiece to optimize AI model performance and handle out-of-vocabulary challenges effectively.
via CodeSignal

177 קורסים


1 hour

שדרוג אופציונלי זמין

בינוני

התקדמות בקצב שלך

Free Certificate

שדרוג אופציונלי זמין

סקירה כללית

This course covers tokenization techniques used in modern AI models, including rule-based methods, subword tokenization (BPE, WordPiece, SentencePiece), and vocabulary optimizations. Learners will implement these methods and understand their impact on NLP model performance.

סילבוס

  • Unit 1: Introduction to Tokenization (Rule-Based Tokenization)
  • Tokenize Text with NLTK
    Sentence Tokenization with NLTK
    Extract Monetary Values with Regex
    Tokenization Showdown with NLTK and spaCy
  • Unit 2: Byte-Pair Encoding (BPE) – Subword Tokenization
  • Exploring Pre-trained Tokenizers with GPT-2
    Using Pre-trained Tokenizers with RoBERTa
    Comparing Tokenization with GPT-2 and RoBERTa
  • Unit 3: Comparing BPE, WordPiece, and SentencePiece in NLP
  • WordPiece Tokenization Challenge
    Tokenization Techniques in Action
    Tokenization Techniques in Action
    Tokenization Techniques for Special Texts
  • Unit 4: Tokenization and Out-of-Vocabulary (OOV) Handling in NLP
  • Tokenization Showdown BERT vs GPT2
    Multilingual Tokenization Challenge
    Multilingual Tokenization and OOV Reduction

נושאים

Computer Science