Skip to content

Corpus Coverage

See what is strong, and what still needs enrichment

The scraped word database is now large enough to behave like a product system. This dashboard turns the corpus into coverage, gaps, and practical next queues.

6,276

total corpus entries

45%

average readiness across signals

Frequency Ranked

strongest current coverage

1,420

largest visible backlog

Highest leverage queue

Verb Conjugation Backlog

Verbs without forms cannot join conjugation or verb-pattern drills. Next: Add present-tense forms first.

Inspect Backlog

Coverage Matrix

Frequency Ranked

Words with source ranks can be ordered into high-impact practice.

Strong

100%

6,274 / 6,276

CEFR Leveled

Words with A1-B1 labels can power cleaner learner progression.

Watch

52%

3,240 / 6,276

Topic Routed

Words with learner-facing topics can become focused routes and drills.

Needs work

13%

821 / 6,276

Example Backed

Example-backed words can train context instead of isolated recall.

Needs work

3%

177 / 6,276

Article Ready

Nouns with gender can power der, die, das article drills.

Strong

100%

3,189 / 3,189

Plural Ready

Nouns with plural forms can train exact German plural recall.

Strong

88%

2,793 / 3,189

Case Ready

Nouns with declensions can train nominative, accusative, dative, and genitive forms.

Needs work

3%

97 / 3,189

Verb Conjugations

Verbs with forms can power present-tense and verb pattern drills.

Needs work

3%

40 / 1,460

Remaining Work Queues

These are the concrete corpus enrichment paths that would make the learner experience stronger next.

7 queues

Level Distribution

Shows where the current database supports learner progression.

A1846 (13%)
A2692 (11%)
B11,702 (27%)
Unleveled3,036 (48%)

Word Class Distribution

Shows what kinds of language the corpus can train right now.

Noun3,189 (51%)
Verb1,460 (23%)
Adjective998 (16%)
Adverb375 (6%)
Pronoun66 (1%)
Numeral62 (1%)
Article2 (0%)

Top Learner Topics

Shows the strongest non-technical topic routes in the corpus.