build_combined_token() — Pattern Engine Documentation
1. Purpose & Signature
The function build_combined_token() is the core pattern engine
used by ACE to generate Latin stems and internal vowels for:
- Perfect / imperfect verbs (active and passive)
- Imperatives / prohibitives
- Active & passive participles
- Verbal nouns (maṣdar)
- Derived nouns (place, instrument, adjectives, exaggeration, colors, broken plurals, etc.)
- Special weak patterns: mithāl, hollow, defective, doubled, hamzated verbs
function build_combined_token(
$F, // pattern code, usually "$form-$tense" (e.g. "1-2", "4-7", "8-1"),
// or a special code: "mth12", "hol11", "def21", "dup1", "hmz01", etc.
$r0, $r1, $r2, // triliteral root letters (Latin: k, t, b, q, w, etc.)
$pronIdx, // pronoun index (1–14); used by some hollow / mithāl / Form VIII rules
$infl1 = 'a', // first internal vowel (faʿala vs faʿila vs faʿula)
$infl2 = 'u', // second internal / imperfect vowel (yaktubu vs yaktibu vs yaktabu)
$tense = '1' // tense index, mainly for reference
)
It returns an array:
[ $latin, $impVowel, $formPrefix, $V1, $V2, $doubledRadical ]
- $latin: the core Latin pattern (stem and sometimes partial prefix).
- $impVowel: the imperfect vowel (
aoru) where relevant. - $formPrefix: form-specific prefix, e.g.,
ta,sta,n. - $V1, $V2: internal vowel markers used by the analyzer and transliteration.
- $doubledRadical: which radical is doubled (e.g.,
$r1in Form II).
2. Pattern Code & Tense Mapping
For regular forms, the pattern code $F is:
$F = "$form-$tense"; // e.g., "1-2", "4-7", "3-17"
Example mappings:
- Form I, Tense 1 (Perfect Active):
$F = "1-1" - Form I, Tense 2 (Imperfect Active):
$F = "1-2" - Form I, Tense 7 (Perfect Passive):
$F = "1-7" - Form I, Tense 15 (Active Participle):
$F = "1-15" - Form I, Tense 17 (Verbal Noun):
$F = "1-17" - … and similarly for Forms II–X.
The Stem Tester sometimes overrides $F with special codes for
weak / doubled / hamzated roots, e.g.:
'mth12','mth13'… — mithāl (initial weak) patterns'hol11','hol12'… — hollow verbs (e.g.qwl,qwm)'def21','def22'… — defective verbs (final weak)'dup1','dup7','dup…'— doubled radicals'hmz01'… — hamzated roots- and others like
'mfr001','18-1'(place noun),'23-3'(broken plurals).
3. Perfect Active & Passive Examples
Form I — Perfect Active (1-1)
F = "1-1", root = ktb, infl1 = 'u' → $latin = "katub" → Pattern: k a t u b → ka.ta.ba (faʿula type)
Form I — Perfect Passive (1-7)
F = "1-7", root = ktb → $latin = "kutib" → Pattern: k u t i b → ku.ti.ba (f-u-ʿi-la)
Form II — Perfect Active (2-1)
F = "2-1", root = drs → $latin = "darrasa" → Pattern: d a r r a s a → da.r.ra.sa (faʿʿala)
Form VIII — Perfect Active with Assimilation (8-1)
Special rules apply when the first radical is dental / emphatic (t, d, z, dh, S, D, T, th, Z).
For example, with r0 = d:
F = "8-1", root = d r k → $latin = "eidrak" → Internally: ei + d + da + r + a + k → Represents iddarak(a) / idtarak(a) style assimilation.
These patterns feed directly into latinToArabic() to generate
correct assimilated Arabic forms.
4. Imperfect Active / Passive Examples
Form I — Imperfect Active (1-2 to 1-6)
F = "1-2", root = ktb, infl2 = 'u'
→ $latin = "aktubu"
Slots:
[0] "aktubu" // stem base ("ktub") with an initial 'a' used in construction
[1] 'a' // imperfect vowel slot
[2] '' // no form prefix
[3] '' // V1 (perfect) not used here
[4] 'u' // imperfect internal vowel (yaktubu)
[5] '' // no doubled radical
The Stem Tester then adds the person prefix (ya, ta, na, ea) and suffixes
to produce yaktubu, taktubu, etc.
Form II — Imperfect Active (2-2 to 2-6)
F = "2-2", root = drs → $latin = "udarrisu" Pattern: u + d a r r i s u → yudarrisu
Passive-like Imperfect (1-8 to 1-12, etc.)
F = "1-8", root = ktb → $latin = "yuktabu" Conceptually: y u k t a b u → “it is written” (passive imperfect)
5. Special Weak Patterns (mithāl, hollow, defective, doubled, hamzated)
For weak verbs, the Stem Tester chooses a custom code instead of a simple form-tense:
- mithāl: initial weak (w / y), e.g.,
'mth12','mth13'… - hollow: middle weak (e.g. qāla, qāma):
'hol11','hol12'… - defective: final weak (e.g. daʿā, ramā):
'def21'… - doubled: shadda verbs:
'dup1','dup7', etc. - hamzated: roots with hamza:
'hmz01','hmz07', etc.
Each of these cases returns pre-calculated Latin patterns that align with:
- Known paradigms from classical nahw / ṣarf texts.
- Your Tajwīd & pronunciation expectations.
- Root-based Qur’ān search patterns.
6. Derived Nouns, Adjectives, & Broken Plurals (Tenses 18–23)
For non-verb categories, build_combined_token() also contains patterns like:
'18-1'— noun of place:ma + r0 + r1 + a + r2'19-1','19-2'— instrument nouns:mi + r0 + r1 + (a/aA) + r2'20-x'— adjectives'21-x'— exaggerated (intensive) nouns'22-x'— colors & defects'23-x'— broken plurals (e.g., eaqlaAm, jibaAl, quluwb)
These Latin patterns are fed into the same pipeline as verbs: article + stem + declension suffix + pronouns → Arabic + IPA + translation.
7. Default Fallback
If a particular $F is not handled, the function falls back to:
default:
$latin = $r0.'a'.$r1.'a'.$r2; // faʿala
return [$latin, '', '', 'a', 'a', ''];
This ensures that the Stem Tester still displays something, even for unimplemented combinations, instead of crashing.