What are the Data Quality dimensions for Address
Last updated
Was this helpful?
Last updated
Was this helpful?
All required fields—street
, city
, postal_code
, country
—must be present and non-empty.
For certain countries (France
, Japan
, Germany
, United Kingdom
, United States
), the state
field is also required.
If any required field is missing or empty, completeness is 0; otherwise, 1.
Examples:
Input: street=1 rue du Louvre
, city=Paris
, postal_code=75001
, country=France
, state=Île-de-France
→ Completeness = 1
Input: street=123 Main St
, city=New York
, postal_code=10001
, country=USA
, state=
(missing) → Completeness = 0
Validity determines if the matches a real-world address with high confidence and completeness.
How validity is computed:
If there is no normalized match from the geocoding analysis, validity is 0
.
If a match exists:
Check Validity: The following conditions must all be true for validity to be 1
:
The normalized address and the geocoded match are not different
The geocoding match's score is above the threshold (0.75
by default).
The match is complete and mapped: all required fields (street_number
, street
, postal_code
, country
, city
, and for some countries, state
) are present in both the input and the match.
If any of these conditions fail, validity is 0
.
Consistency checks if, for each required field (street, city, postal code, state, country), the original and normalized values are equal after strong normalization (removing accents, punctuation, and lowercasing). If any required field differs after strong normalization, consistency is 0; otherwise, 1.
Examples:
Original: paris fRance
, Normalized: Paris France
→ Consistency = 1
Original: Paris
, Normalized: Lyon
→ Consistency = 0
The normalization cleans and standardizes the input fields.
Each address field is normalized using specialized logic:
Street:
Cleaned and standardized for casing and accents, with particles compressed to their preferred short form (e.g., 'avenue' → 'Av', 'boulevard' → 'Bd', 'de', 'du', 'la', etc. replaced with a standard, often shorter, form for consistency).
Examples:
Input: rue du Louvre
→ Normalized: Rue du Louvre
Input: Avenue de l'Opéra
→ Normalized: Av de l'Opera
Input: 43-45 boulevard Saint-Germain
→ Normalized: 43-45 Bd Saint-Germain
City/State:
Normalized for casing, accents, and removes unwanted punctuation (commas, parentheses).
Examples:
Input: paris
→ Normalized: Paris
Input: saint-denis (île-de-france)
→ Normalized: Saint-Denis Ile-De-France
Postal Code:
Uppercased and cleaned, keeps special characters.
Example:
Input: 75001
→ Normalized: 75001
Input: sw1a 1aa
→ Normalized: SW1A 1AA
Country:
Normalized for casing, accents, and known aliases using a geo database. Title-cased for presentation.
Examples:
Input: france
→ Normalized: France
Input: deutschland
→ Normalized: Germany
Country Code:
Uppercased, removes special characters, converts non-ASCII.
Example:
Input: fr
→ Normalized: FR
Input: de
→ Normalized: DE
How recommendations are generated:
If there is no normalized match, no recommendations are returned and accuracy is set to -1
.
If a match exists:
The geocoded match is used as the recommendation.
If the normalized input street (after strong normalization) contains the geocoded match's street, the match's street is replaced with the input's street (to preserve original formatting).
The accuracy is computed as:
accuracy = geocoding_score * completeness_of_recommendation
f the computed accuracy is greater than 0.5, the recommendation is returned; otherwise, no recommendation is returned and accuracy is set to 0
.
How the status is assigned:
If accuracy is -1
: label is "Unknown"
.
If accuracy is below the threshold (0.75
by default): label is "No"
.
If accuracy is greater than or equal to the threshold: label is "Ok"
.