Edit on GitHub Open in ChatGPT Open in Claude

Data Quality Dashboard

Data Quality Dashboard#

The Data Quality Dashboard is your control center for maintaining a clean and accurate Party database. It automatically detects potential duplicates across your Party Masters using intelligent similarity scoring and normalization techniques.

Key Capabilities#

Automated Detection: Scans party names using advanced text normalization and fuzzy matching.
Smart Scoring: Assigns a similarity score (0-100%) using RapidFuzz algorithm.
Merge Action: Consolidate duplicate records into a single master with one click.
Exclusion List: Permanently ignore false positives with dismissal reasons.
Dashboard Statistics: Provides summary metrics including total parties, groups, exclusions, and potential duplicates.

1. Using the Dashboard#

Navigate to Party > Data Quality Dashboard.
Filter: Adjust the “Minimum Similarity Score” slider to filter results (Default: 70%).
Review Pairs: The dashboard lists potential duplicate pairs with similarity scores.

Understanding the Score#

>90%: Highly likely to be a duplicate (e.g., “Acme Corp” vs “Acme Corp.”).
70-90%: Likely duplicate, check details (e.g., “John Doe” vs “Johnathan Doe”).
<70%: Possible false positive.

2. Text Normalization#

The dashboard uses advanced text normalization to improve duplicate detection accuracy:

Arabic Character Normalization#

Converts different Arabic letter forms to standard forms (e.g., أ, إ, آ → ا)
Handles Persian character variations (e.g., ك → ک, ي → ی)
Removes diacritics (harakat, tanwin)
Normalizes numerals (Indian → Arabic digits)

Latin Character Normalization#

Removes accents and diacritics (e.g., é → e, ñ → n)
Handles common prefixes/suffixes (e.g., LLC, Inc, Corp)
Normalizes whitespace and case

3. Resolving Duplicates#

For each identified pair, you have two primary actions:

A. Merge#

If the records represent the same entity:

Click Merge.
Confirm the action and select fields to keep from each record.
Result: The secondary party is merged into the primary party. All linked documents (Invoices, Orders) are reassigned to the primary party, and the secondary party is deleted. A duplicate exclusion entry is automatically created.

B. Dismiss (Ignore)#

If the records are distinct entities (e.g., “Apple Inc.” vs “Apple Store”):

Click Dismiss.
(Optional) Provide a reason.
Result: The pair is added to the Duplicate Exclusion list and will not reappear in the dashboard.

**Undo Dismissal**: You can view and restore dismissed pairs by navigating to the **Duplicate Exclusion** DocType list.

4. Dashboard Statistics#

The dashboard provides real-time statistics:

Total Parties: Number of active Party Masters (non-group)
Total Groups: Number of Party Master groups
Total Exclusions: Number of dismissed duplicate pairs
Incomplete Parties: Number of parties without party numbers
Potential Duplicates: Number of potential duplicate pairs (based on current filters)

5. Performance Optimization#

Uses sampling approach for duplicate detection to ensure fast performance
Results are paginated to handle large datasets efficiently
Normalization and similarity scoring are optimized for Arabic text

6. Technical Implementation#

The Data Quality Dashboard is implemented using:

Backend: Python with RapidFuzz library for fuzzy matching
Frontend: Frappe Framework with Vue.js
Normalization: Custom implementation handling Arabic, Persian, and Latin characters
Storage: Duplicate exclusion records stored in Duplicate Exclusion DocType