build_combined_token() — Pattern Engine Documentation

1. Purpose & Signature

The function build_combined_token() is the core pattern engine used by ACE to generate Latin stems and internal vowels for:

  • Perfect / imperfect verbs (active and passive)
  • Imperatives / prohibitives
  • Active & passive participles
  • Verbal nouns (maṣdar)
  • Derived nouns (place, instrument, adjectives, exaggeration, colors, broken plurals, etc.)
  • Special weak patterns: mithāl, hollow, defective, doubled, hamzated verbs
function build_combined_token(
    $F,            // pattern code, usually "$form-$tense" (e.g. "1-2", "4-7", "8-1"),
                   // or a special code: "mth12", "hol11", "def21", "dup1", "hmz01", etc.
    $r0, $r1, $r2, // triliteral root letters (Latin: k, t, b, q, w, etc.)
    $pronIdx,      // pronoun index (1–14); used by some hollow / mithāl / Form VIII rules
    $infl1 = 'a',  // first internal vowel (faʿala vs faʿila vs faʿula)
    $infl2 = 'u',  // second internal / imperfect vowel (yaktubu vs yaktibu vs yaktabu)
    $tense = '1'   // tense index, mainly for reference
)

It returns an array:

[ $latin, $impVowel, $formPrefix, $V1, $V2, $doubledRadical ]
  • $latin: the core Latin pattern (stem and sometimes partial prefix).
  • $impVowel: the imperfect vowel (a or u) where relevant.
  • $formPrefix: form-specific prefix, e.g., ta, sta, n.
  • $V1, $V2: internal vowel markers used by the analyzer and transliteration.
  • $doubledRadical: which radical is doubled (e.g., $r1 in Form II).
2. Pattern Code & Tense Mapping

For regular forms, the pattern code $F is:

$F = "$form-$tense";   // e.g., "1-2", "4-7", "3-17"

Example mappings:

  • Form I, Tense 1 (Perfect Active): $F = "1-1"
  • Form I, Tense 2 (Imperfect Active): $F = "1-2"
  • Form I, Tense 7 (Perfect Passive): $F = "1-7"
  • Form I, Tense 15 (Active Participle): $F = "1-15"
  • Form I, Tense 17 (Verbal Noun): $F = "1-17"
  • … and similarly for Forms II–X.

The Stem Tester sometimes overrides $F with special codes for weak / doubled / hamzated roots, e.g.:

  • 'mth12', 'mth13' … — mithāl (initial weak) patterns
  • 'hol11', 'hol12' … — hollow verbs (e.g. qwl, qwm)
  • 'def21', 'def22' … — defective verbs (final weak)
  • 'dup1', 'dup7', 'dup…' — doubled radicals
  • 'hmz01' … — hamzated roots
  • and others like 'mfr001', '18-1' (place noun), '23-3' (broken plurals).
3. Perfect Active & Passive Examples

Form I — Perfect Active (1-1)

F = "1-1", root = ktb, infl1 = 'u'
→ $latin = "katub"
→ Pattern: k a t u b → ka.ta.ba (faʿula type)

Form I — Perfect Passive (1-7)

F = "1-7", root = ktb
→ $latin = "kutib"
→ Pattern: k u t i b → ku.ti.ba (f-u-ʿi-la)

Form II — Perfect Active (2-1)

F = "2-1", root = drs
→ $latin = "darrasa"
→ Pattern: d a r r a s a → da.r.ra.sa (faʿʿala)

Form VIII — Perfect Active with Assimilation (8-1)

Special rules apply when the first radical is dental / emphatic (t, d, z, dh, S, D, T, th, Z). For example, with r0 = d:

F = "8-1", root = d r k
→ $latin = "eidrak"
→ Internally: ei + d + da + r + a + k
→ Represents iddarak(a) / idtarak(a) style assimilation.

These patterns feed directly into latinToArabic() to generate correct assimilated Arabic forms.

4. Imperfect Active / Passive Examples

Form I — Imperfect Active (1-2 to 1-6)

F = "1-2", root = ktb, infl2 = 'u'
→ $latin = "aktubu"
Slots:
  [0] "aktubu"  // stem base ("ktub") with an initial 'a' used in construction
  [1] 'a'       // imperfect vowel slot
  [2] ''        // no form prefix
  [3] ''        // V1 (perfect) not used here
  [4] 'u'       // imperfect internal vowel (yaktubu)
  [5] ''        // no doubled radical

The Stem Tester then adds the person prefix (ya, ta, na, ea) and suffixes to produce yaktubu, taktubu, etc.

Form II — Imperfect Active (2-2 to 2-6)

F = "2-2", root = drs
→ $latin = "udarrisu"
Pattern: u + d a r r i s u → yudarrisu

Passive-like Imperfect (1-8 to 1-12, etc.)

F = "1-8", root = ktb
→ $latin = "yuktabu"
Conceptually: y u k t a b u → “it is written” (passive imperfect)
5. Special Weak Patterns (mithāl, hollow, defective, doubled, hamzated)

For weak verbs, the Stem Tester chooses a custom code instead of a simple form-tense:

  • mithāl: initial weak (w / y), e.g., 'mth12', 'mth13'
  • hollow: middle weak (e.g. qāla, qāma): 'hol11', 'hol12'
  • defective: final weak (e.g. daʿā, ramā): 'def21'
  • doubled: shadda verbs: 'dup1', 'dup7', etc.
  • hamzated: roots with hamza: 'hmz01', 'hmz07', etc.

Each of these cases returns pre-calculated Latin patterns that align with:

  • Known paradigms from classical nahw / ṣarf texts.
  • Your Tajwīd & pronunciation expectations.
  • Root-based Qur’ān search patterns.
6. Derived Nouns, Adjectives, & Broken Plurals (Tenses 18–23)

For non-verb categories, build_combined_token() also contains patterns like:

  • '18-1' — noun of place: ma + r0 + r1 + a + r2
  • '19-1', '19-2' — instrument nouns: mi + r0 + r1 + (a/aA) + r2
  • '20-x' — adjectives
  • '21-x' — exaggerated (intensive) nouns
  • '22-x' — colors & defects
  • '23-x' — broken plurals (e.g., eaqlaAm, jibaAl, quluwb)

These Latin patterns are fed into the same pipeline as verbs: article + stem + declension suffix + pronouns → Arabic + IPA + translation.

7. Default Fallback

If a particular $F is not handled, the function falls back to:

default:
    $latin = $r0.'a'.$r1.'a'.$r2;   // faʿala
    return [$latin, '', '', 'a', 'a', ''];

This ensures that the Stem Tester still displays something, even for unimplemented combinations, instead of crashing.