ACE — Arabic Tools

ACE — Reverse Analyzer Flowchart (Current Engine / File-Aware)

This file documents the current ACE reverse-analysis strategy used by ace_reverse_assessor.php and its helper/rule files. It replaces the older flowchart model shown in the earlier document :contentReference[oaicite:1]{index=1}.

Current design principle:
ACE does not force a token into a category too early. Instead, it collects evidence from exact matching, proclitic handling, suffix families, verb/noun routers, root support, dictionary support, and context from nearby particles, then ranks candidates.

Stage 0 — Input Intake and Token Preparation

0.1 Main file receives token input

Handled in ace/tools/ace_reverse_assessor.php
Input may come from GET, POST, test token arrays, or plain text with spaces
Text normalization helpers split commas / spaces / newlines into token lists

0.2 Normalize token surface

Lowercase transliteration
Trim whitespace
Preserve digraph units such as sh, kh, dh, gh, th
Preserve long vowels and sequences such as aA, iy, uw, AA
No POS assignment yet

0.3 Shared regex and suffix arrays loaded first

From ace/tools/bootstrap_rules.php
Includes consonant/vowel primitives such as ACE_C
Includes shared suffix catalogs such as: $PERF, $IMPF, $OBJ, $AMR, $NOMINAL

Stage 1 — Context Capture Before Analysis

1.1 Preserve token sequence

ACE reads tokens in order, not only as isolated forms
The previous token may govern the mood or strengthen one family of analysis

1.2 Particle context can affect candidate ranking

Perfect-supporting particles such as qad
Subjunctive governors such as lan, ean, kay, HattaA
Jussive governors such as lam, lammaA, and command-related li
Future or modal support such as sa, sawfa

1.3 Context influences weighting, not blind replacement

A governing particle does not automatically erase all other analyses, but it can raise the score of a matching imperfect or perfect candidate.

Stage 2 — Early Left-Edge Proclitic Handling

2.1 Strip/record conjunction and attached particles from the left edge

Done early in the main engine/helpers, before deeper morphology
Typical proclitics include: wa, fa, la, bi, li
Combined openings such as wal and fal must be supported when valid

2.2 Prefix is stored separately

The stripped left piece is recorded as prefix
The remainder is sent to exact lookup and morphological routers
Prefix is not allowed to corrupt root extraction

2.3 Re-test full token and remainder

The raw token may be exact
The stripped remainder may also be exact
The stripped remainder may expose imperfect / imperative / nominal evidence

Stage 3 — Exact / Non-Derived Protection Layer

3.1 Exact lookup happens early

Main helpers in tools_functions2.php consult local non-derived data
Primary source: ace/data/include_particles.php
Categories include particles, pronouns, prepositions, conjunctions, demonstratives, relatives, interrogatives, and other fixed entries

3.2 Exact lookup is tested on full token and valid stripped variants

Full surface token first
Then post-prefix remainder where appropriate
Then simple exact-plus-suffix variants where supported

3.3 Exact candidates should not be swallowed by morphology

If a token is a strong exact / non-derived match, ACE should preserve that instead of misclassifying it as a verb or noun merely because the letters resemble a pattern.

Stage 4 — Controlled Right-Side Stripping

4.1 Object pronouns are handled as detachable tails

Examples: hu, hum, humA, hunna, haA, ka, ki, kum, kumaA, naA, niy
These are stored as removed tail or object suffix evidence

4.2 Right-side stripping is family-aware

Perfect endings are tested in perfect routing
Imperfect endings are tested in imperfect routing
Imperative endings are tested in imperative routing
Nominal declension endings are tested in nominal routing

This is one of the biggest differences from the old flowchart. ACE no longer does one blind global strip pass and then guesses.

4.3 Shared helper behavior

Helpers such as suffix-variant builders in tools_functions2.php generate alternate stem states
These alternates are passed into family-specific rule chains

Stage 5 — Candidate Family Routing

5.1 The main competing families

Exact / Non-Derived
Perfect Verb
Imperfect Verb
Imperative Verb
Nominal / Participle / Verbal Noun

5.2 Families compete instead of replacing one another too early

Each family may emit one or more candidates
Each candidate keeps its own evidence trail
Later ranking decides which candidate is best

5.3 Candidate fields preserved

token
prefix
rest
suffix
removed tail
stem used
root
form
pattern
tense / voice
pronoun
confidence

Stage 6 — Perfect Verb Router

6.1 Perfect active rules

Main rule file: ace/tools/rules/perfect_active.php
Tests Form I patterns and higher forms II–X
Includes strong, doubled, and derived patterns

6.2 Perfect passive rules

Main rule file: ace/tools/rules/perfect_passive.php
Uses passive vocalization patterns for Forms I–X where valid

6.3 Perfect suffix evidence

Uses $PERF family endings from bootstrap support
Examples include: tu, ta, ti, tum, tumaA, tunna, naA, na, at, aA, uw

6.4 Perfect is not defeated just because a noun-like shell exists

If perfect morphology plus suffix evidence plus root support is strong, it should outrank a weak nominal guess.

Stage 7 — Imperfect Verb Router

7.1 Imperfect prefixes

Main helpers detect prefixes such as: ya, yu, ta, tu, na, nu, ea, eu
This logic is handled in helpers and imperfect routing code

7.2 Imperfect active routes

Rule files include: imperfect_active_strong.php, imperfect_active_assimilated.php, imperfect_active_hollow.php, imperfect_active_duplicated.php

7.3 Imperfect passive routes

Rule files include passive counterparts such as: imperfect_passive_*.php
Passive and active are evaluated separately

7.4 Mood-aware imperfect handling

Indicative
Subjunctive
Jussive
Light emphatic
Heavy emphatic

Candidate selection may be strengthened by particle context from the preceding token.

7.5 Imperfect suffix family

Uses endings from $IMPF
Examples: u, a, i, an, in, un, aA, iy, na, uw, aAni, iyna, uwna

Stage 8 — Imperative Router

8.1 Imperative is its own family

Main rule file: ace/tools/rules/imperative.php
Imperative is not treated as a damaged imperfect
Restricted to second-person logic

8.2 Imperative endings

Uses the $AMR suffix family
Typical endings: Ø, iy, aA, uw, na

8.3 Imperative may carry attached object pronouns

ACE should recognize combinations such as imperative stem + ending + object tail
Example logic: kaAtib + iy + naA

8.4 Irregular and weak imperatives are integrated here

Examples include: qi, qul, kun, sir, khaf
Weak classes are not postponed to some final repair stage

Stage 9 — Nominal / Participle / Masdar Routers

9.1 Nominal rule files

active_participles.php
passive_participles.php
verbal_nouns.php
adjectives.php
instruments.php
colors_and_defects.php
broken_plurals.php

9.2 Strict nominal policy

A bare stem with no valid nominal evidence must not automatically be labeled as a noun.

Prefer explicit declension support
Prefer strong nominal pattern support
Do not let weak noun guesses outrank strong verbal evidence

9.3 Nominal declension family

Uses $NOMINAL support from bootstrap_rules.php
Includes singular case endings, tanwīn, dual, sound plurals, etc.
Examples: u, a, i, un, an, in, aAni, ayni, uwna, iyna

9.4 Nominal prefixes are only evidence, not proof

Examples: mu, ma, mi, musta
These must still be supported by pattern and suffix evidence

Stage 10 — Root Skeleton, Form Detection, and Weak Classes

10.1 Skeleton extraction

Main helpers in tools_functions2.php extract consonantal frames
Digraph consonants remain atomic
Gemination / doubled radicals are allowed where required

10.2 Derived form detection

Form I: base classes and reduced variants
Form II: geminated second radical
Form III: aA after first radical
Form IV: ea / eu openings in valid environments
Form V: ta + Form II behavior
Form VI: ta + aA
Form VII: ein / in / ink patterns
Form VIII: inserted t, including assimilation behavior
Form IX: color/defect patterns
Form X: sta / sti

10.3 Weak and special root subclasses

Strong
Assimilated
Hollow
Defective
Doubled / duplicated
Hamzated
Doubly weak

10.4 Form VIII assimilation must be allowed

Inserted t may assimilate in surface forms
ACE should not require only textbook unassimilated shapes

Stage 11 — Dictionary and Translation Enrichment

11.1 Base root lookup

Translation support comes from: ace/tools/translation_functions.php
Dictionary data comes from: ace/data/dictionary_loader.php
Root-based lookup prefers a base/Form I semantic anchor

11.2 Candidate enrichment

Add English gloss
Add Urdu gloss
Add base English / base Urdu if available
Adjust meanings according to detected form or noun type

11.3 Root support improves confidence

A candidate with a valid supported root and usable gloss generally ranks above a structurally weak candidate with no lexical support.

Stage 12 — Ranking and Final Output

12.1 Rank all collected candidates

Exact match strength
Family-specific suffix validity
Pattern-form agreement
Root plausibility
Dictionary / translation support
Particle-context agreement
Penalty for over-stripping or implausible stems

12.2 Best candidate displayed first

The top candidate becomes the main visible analysis
Alternate matches may still be shown in the match bundle

12.3 Final row output fields

Token
Prefix
Rest
POS
Category
English
Urdu
Tense / Voice
Form
Pattern
Root
Suffix
Removed Tail
Stem Used
Pronoun
Confidence
Match Count

Stage 13 — Current Practical Rules

13.1 What ACE should no longer do

Do not do one blind suffix strip and then guess
Do not misread imperative as broken imperfect
Do not allow exact particles/pronouns to be swallowed by morphology
Do not label bare stems as nouns without real nominal evidence

13.2 What ACE should do

Protect exact/non-derived items early
Peel proclitics early and record them separately
Route into competing families
Strip suffixes in family-aware ways
Use particle context to break ties
Use dictionary/root support to strengthen valid analyses

13.3 Short summary of the current engine

Current ACE model: normalize → preserve context → peel proclitics → protect exact matches → route by family → test suffix families carefully → enrich → rank → display

This document supersedes the earlier ACE reverse flowchart and reflects the current file-aware router-based design.