ACE β€” Reverse Analyzer Flowchart (Current Engine / File-Aware)

ACE β€” Reverse Analyzer Flowchart (Current Engine / File-Aware)

This file documents the current ACE reverse-analysis strategy used by ace_reverse_assessor.php and its helper/rule files. It replaces the older flowchart model shown in the earlier document :contentReference[oaicite:1]{index=1}.

Current design principle:
ACE does not force a token into a category too early. Instead, it collects evidence from exact matching, proclitic handling, suffix families, verb/noun routers, root support, dictionary support, and context from nearby particles, then ranks candidates.

Stage 0 β€” Input Intake and Token Preparation

0.1 Main file receives token input
  • Handled in ace/tools/ace_reverse_assessor.php
  • Input may come from GET, POST, test token arrays, or plain text with spaces
  • Text normalization helpers split commas / spaces / newlines into token lists
0.2 Normalize token surface
  • Lowercase transliteration
  • Trim whitespace
  • Preserve digraph units such as sh, kh, dh, gh, th
  • Preserve long vowels and sequences such as aA, iy, uw, AA
  • No POS assignment yet
0.3 Shared regex and suffix arrays loaded first
  • From ace/tools/bootstrap_rules.php
  • Includes consonant/vowel primitives such as ACE_C
  • Includes shared suffix catalogs such as: $PERF, $IMPF, $OBJ, $AMR, $NOMINAL

Stage 1 β€” Context Capture Before Analysis

1.1 Preserve token sequence
  • ACE reads tokens in order, not only as isolated forms
  • The previous token may govern the mood or strengthen one family of analysis
1.2 Particle context can affect candidate ranking
  • Perfect-supporting particles such as qad
  • Subjunctive governors such as lan, ean, kay, HattaA
  • Jussive governors such as lam, lammaA, and command-related li
  • Future or modal support such as sa, sawfa
1.3 Context influences weighting, not blind replacement

A governing particle does not automatically erase all other analyses, but it can raise the score of a matching imperfect or perfect candidate.

Stage 2 β€” Early Left-Edge Proclitic Handling

2.1 Strip/record conjunction and attached particles from the left edge
  • Done early in the main engine/helpers, before deeper morphology
  • Typical proclitics include: wa, fa, la, bi, li
  • Combined openings such as wal and fal must be supported when valid
2.2 Prefix is stored separately
  • The stripped left piece is recorded as prefix
  • The remainder is sent to exact lookup and morphological routers
  • Prefix is not allowed to corrupt root extraction
2.3 Re-test full token and remainder
  • The raw token may be exact
  • The stripped remainder may also be exact
  • The stripped remainder may expose imperfect / imperative / nominal evidence

Stage 3 β€” Exact / Non-Derived Protection Layer

3.1 Exact lookup happens early
  • Main helpers in tools_functions2.php consult local non-derived data
  • Primary source: ace/data/include_particles.php
  • Categories include particles, pronouns, prepositions, conjunctions, demonstratives, relatives, interrogatives, and other fixed entries
3.2 Exact lookup is tested on full token and valid stripped variants
  • Full surface token first
  • Then post-prefix remainder where appropriate
  • Then simple exact-plus-suffix variants where supported
3.3 Exact candidates should not be swallowed by morphology

If a token is a strong exact / non-derived match, ACE should preserve that instead of misclassifying it as a verb or noun merely because the letters resemble a pattern.

Stage 4 β€” Controlled Right-Side Stripping

4.1 Object pronouns are handled as detachable tails
  • Examples: hu, hum, humA, hunna, haA, ka, ki, kum, kumaA, naA, niy
  • These are stored as removed tail or object suffix evidence
4.2 Right-side stripping is family-aware
  • Perfect endings are tested in perfect routing
  • Imperfect endings are tested in imperfect routing
  • Imperative endings are tested in imperative routing
  • Nominal declension endings are tested in nominal routing

This is one of the biggest differences from the old flowchart. ACE no longer does one blind global strip pass and then guesses.

4.3 Shared helper behavior
  • Helpers such as suffix-variant builders in tools_functions2.php generate alternate stem states
  • These alternates are passed into family-specific rule chains

Stage 5 β€” Candidate Family Routing

5.1 The main competing families
  • Exact / Non-Derived
  • Perfect Verb
  • Imperfect Verb
  • Imperative Verb
  • Nominal / Participle / Verbal Noun
5.2 Families compete instead of replacing one another too early
  • Each family may emit one or more candidates
  • Each candidate keeps its own evidence trail
  • Later ranking decides which candidate is best
5.3 Candidate fields preserved
  • token
  • prefix
  • rest
  • suffix
  • removed tail
  • stem used
  • root
  • form
  • pattern
  • tense / voice
  • pronoun
  • confidence

Stage 6 β€” Perfect Verb Router

6.1 Perfect active rules
  • Main rule file: ace/tools/rules/perfect_active.php
  • Tests Form I patterns and higher forms II–X
  • Includes strong, doubled, and derived patterns
6.2 Perfect passive rules
  • Main rule file: ace/tools/rules/perfect_passive.php
  • Uses passive vocalization patterns for Forms I–X where valid
6.3 Perfect suffix evidence
  • Uses $PERF family endings from bootstrap support
  • Examples include: tu, ta, ti, tum, tumaA, tunna, naA, na, at, aA, uw
6.4 Perfect is not defeated just because a noun-like shell exists

If perfect morphology plus suffix evidence plus root support is strong, it should outrank a weak nominal guess.

Stage 7 β€” Imperfect Verb Router

7.1 Imperfect prefixes
  • Main helpers detect prefixes such as: ya, yu, ta, tu, na, nu, ea, eu
  • This logic is handled in helpers and imperfect routing code
7.2 Imperfect active routes
  • Rule files include: imperfect_active_strong.php, imperfect_active_assimilated.php, imperfect_active_hollow.php, imperfect_active_duplicated.php
7.3 Imperfect passive routes
  • Rule files include passive counterparts such as: imperfect_passive_*.php
  • Passive and active are evaluated separately
7.4 Mood-aware imperfect handling
  • Indicative
  • Subjunctive
  • Jussive
  • Light emphatic
  • Heavy emphatic

Candidate selection may be strengthened by particle context from the preceding token.

7.5 Imperfect suffix family
  • Uses endings from $IMPF
  • Examples: u, a, i, an, in, un, aA, iy, na, uw, aAni, iyna, uwna

Stage 8 β€” Imperative Router

8.1 Imperative is its own family
  • Main rule file: ace/tools/rules/imperative.php
  • Imperative is not treated as a damaged imperfect
  • Restricted to second-person logic
8.2 Imperative endings
  • Uses the $AMR suffix family
  • Typical endings: Ø, iy, aA, uw, na
8.3 Imperative may carry attached object pronouns
  • ACE should recognize combinations such as imperative stem + ending + object tail
  • Example logic: kaAtib + iy + naA
8.4 Irregular and weak imperatives are integrated here
  • Examples include: qi, qul, kun, sir, khaf
  • Weak classes are not postponed to some final repair stage

Stage 9 β€” Nominal / Participle / Masdar Routers

9.1 Nominal rule files
  • active_participles.php
  • passive_participles.php
  • verbal_nouns.php
  • adjectives.php
  • instruments.php
  • colors_and_defects.php
  • broken_plurals.php
9.2 Strict nominal policy

A bare stem with no valid nominal evidence must not automatically be labeled as a noun.

  • Prefer explicit declension support
  • Prefer strong nominal pattern support
  • Do not let weak noun guesses outrank strong verbal evidence
9.3 Nominal declension family
  • Uses $NOMINAL support from bootstrap_rules.php
  • Includes singular case endings, tanwΔ«n, dual, sound plurals, etc.
  • Examples: u, a, i, un, an, in, aAni, ayni, uwna, iyna
9.4 Nominal prefixes are only evidence, not proof
  • Examples: mu, ma, mi, musta
  • These must still be supported by pattern and suffix evidence

Stage 10 β€” Root Skeleton, Form Detection, and Weak Classes

10.1 Skeleton extraction
  • Main helpers in tools_functions2.php extract consonantal frames
  • Digraph consonants remain atomic
  • Gemination / doubled radicals are allowed where required
10.2 Derived form detection
  • Form I: base classes and reduced variants
  • Form II: geminated second radical
  • Form III: aA after first radical
  • Form IV: ea / eu openings in valid environments
  • Form V: ta + Form II behavior
  • Form VI: ta + aA
  • Form VII: ein / in / ink patterns
  • Form VIII: inserted t, including assimilation behavior
  • Form IX: color/defect patterns
  • Form X: sta / sti
10.3 Weak and special root subclasses
  • Strong
  • Assimilated
  • Hollow
  • Defective
  • Doubled / duplicated
  • Hamzated
  • Doubly weak
10.4 Form VIII assimilation must be allowed
  • Inserted t may assimilate in surface forms
  • ACE should not require only textbook unassimilated shapes

Stage 11 β€” Dictionary and Translation Enrichment

11.1 Base root lookup
  • Translation support comes from: ace/tools/translation_functions.php
  • Dictionary data comes from: ace/data/dictionary_loader.php
  • Root-based lookup prefers a base/Form I semantic anchor
11.2 Candidate enrichment
  • Add English gloss
  • Add Urdu gloss
  • Add base English / base Urdu if available
  • Adjust meanings according to detected form or noun type
11.3 Root support improves confidence

A candidate with a valid supported root and usable gloss generally ranks above a structurally weak candidate with no lexical support.

Stage 12 β€” Ranking and Final Output

12.1 Rank all collected candidates
  • Exact match strength
  • Family-specific suffix validity
  • Pattern-form agreement
  • Root plausibility
  • Dictionary / translation support
  • Particle-context agreement
  • Penalty for over-stripping or implausible stems
12.2 Best candidate displayed first
  • The top candidate becomes the main visible analysis
  • Alternate matches may still be shown in the match bundle
12.3 Final row output fields
  • Token
  • Prefix
  • Rest
  • POS
  • Category
  • English
  • Urdu
  • Tense / Voice
  • Form
  • Pattern
  • Root
  • Suffix
  • Removed Tail
  • Stem Used
  • Pronoun
  • Confidence
  • Match Count

Stage 13 β€” Current Practical Rules

13.1 What ACE should no longer do
  • Do not do one blind suffix strip and then guess
  • Do not misread imperative as broken imperfect
  • Do not allow exact particles/pronouns to be swallowed by morphology
  • Do not label bare stems as nouns without real nominal evidence
13.2 What ACE should do
  • Protect exact/non-derived items early
  • Peel proclitics early and record them separately
  • Route into competing families
  • Strip suffixes in family-aware ways
  • Use particle context to break ties
  • Use dictionary/root support to strengthen valid analyses
13.3 Short summary of the current engine

Current ACE model: normalize β†’ preserve context β†’ peel proclitics β†’ protect exact matches β†’ route by family β†’ test suffix families carefully β†’ enrich β†’ rank β†’ display

This document supersedes the earlier ACE reverse flowchart and reflects the current file-aware router-based design.