Data Quality Dashboard
Data Quality Dashboard#
The Data Quality Dashboard is your control center for maintaining a clean and accurate Party database. It automatically detects potential duplicates across your Party Masters using intelligent similarity scoring and normalization techniques.
Key Capabilities#
- Automated Detection: Scans party names using advanced text normalization and fuzzy matching.
- Smart Scoring: Assigns a similarity score (0-100%) using RapidFuzz algorithm.
- Merge Action: Consolidate duplicate records into a single master with one click.
- Exclusion List: Permanently ignore false positives with dismissal reasons.
- Dashboard Statistics: Provides summary metrics including total parties, groups, exclusions, and potential duplicates.
1. Using the Dashboard#
- Navigate to Party > Data Quality Dashboard.
- Filter: Adjust the “Minimum Similarity Score” slider to filter results (Default: 70%).
- Review Pairs: The dashboard lists potential duplicate pairs with similarity scores.
Understanding the Score#
- >90%: Highly likely to be a duplicate (e.g., “Acme Corp” vs “Acme Corp.”).
- 70-90%: Likely duplicate, check details (e.g., “John Doe” vs “Johnathan Doe”).
- <70%: Possible false positive.
2. Text Normalization#
The dashboard uses advanced text normalization to improve duplicate detection accuracy:
Arabic Character Normalization#
- Converts different Arabic letter forms to standard forms (e.g., أ, إ, آ → ا)
- Handles Persian character variations (e.g., ك → ک, ي → ی)
- Removes diacritics (harakat, tanwin)
- Normalizes numerals (Indian → Arabic digits)
Latin Character Normalization#
- Removes accents and diacritics (e.g., é → e, ñ → n)
- Handles common prefixes/suffixes (e.g., LLC, Inc, Corp)
- Normalizes whitespace and case
3. Resolving Duplicates#
For each identified pair, you have two primary actions:
A. Merge#
If the records represent the same entity:
- Click Merge.
- Confirm the action and select fields to keep from each record.
- Result: The secondary party is merged into the primary party. All linked documents (Invoices, Orders) are reassigned to the primary party, and the secondary party is deleted. A duplicate exclusion entry is automatically created.
B. Dismiss (Ignore)#
If the records are distinct entities (e.g., “Apple Inc.” vs “Apple Store”):
- Click Dismiss.
- (Optional) Provide a reason.
- Result: The pair is added to the Duplicate Exclusion list and will not reappear in the dashboard.
**Undo Dismissal**: You can view and restore dismissed pairs by navigating to the **Duplicate Exclusion** DocType list.
4. Dashboard Statistics#
The dashboard provides real-time statistics:
- Total Parties: Number of active Party Masters (non-group)
- Total Groups: Number of Party Master groups
- Total Exclusions: Number of dismissed duplicate pairs
- Incomplete Parties: Number of parties without party numbers
- Potential Duplicates: Number of potential duplicate pairs (based on current filters)
5. Performance Optimization#
- Uses sampling approach for duplicate detection to ensure fast performance
- Results are paginated to handle large datasets efficiently
- Normalization and similarity scoring are optimized for Arabic text
6. Technical Implementation#
The Data Quality Dashboard is implemented using:
- Backend: Python with RapidFuzz library for fuzzy matching
- Frontend: Frappe Framework with Vue.js
- Normalization: Custom implementation handling Arabic, Persian, and Latin characters
- Storage: Duplicate exclusion records stored in
Duplicate ExclusionDocType