I have a database of demographic and lifestyle data, bought in from a
third party, that I am trying to match to a consumer database. I am
currently using postcode, surname, initial and title (and subsets of
these) to match the 2 datasets. I use aggregated data for consumers that
can only be matched by postcode.
I would like to improve the number of 1-1 matches (no duplicates) I can
make between the 2 datasets and also decrease the number of matches made
only at postcode level.
I have full name and address available to me in both datasets, but not
gender or date of birth.
I have thought about Soundex codes for surnames to overcome data quality
Can anyone suggest a method/useful references that would achieve the
desired improvements, ideally using SAS or SQL?