Print

Print


Chris,

 

Before moving to SCHIN 3 months ago, I used to work for Acxiom - the
UK's largest data warehousing and data hygiene company.

The simple answer is there are no easy methods of de-duping large
amounts of personal data.

We spent over 10 years perfecting our algorithms (and acquiring other
companies) to get where we were. 

We also had to purchase many expensive lookup lists such as Royal Mail's
(or Consignia or whatever they're called now) PAF files, the electoral
registry, goneaway files, etc.

 

All I can suggest is that you get your IT dept to develop a series of
string comparison queries for each field in your data set.

If you haven't got many records (less than 1 million) you could load the
data into MS Access and use the built in deduping wizard.

 

Hope this helps.

 

PS I'm one of the "quiet" members of this list so if someone knows more,
don't shoot me down in flames.

 

Regards,

 

 

   

Geoff Scott
Project Manager

Data Protection Officer

Security Officer

Quality Manager

 

*                 +44 (0)191 243 6125

7                  +44 (0)191 243 6101

*                 Bede House, All Saints Business Park, Newcastle Upon
Tyne. NE1 2ES

*                  http://www.schin.ncl.ac.uk
<http://www.schin.ncl.ac.uk/> 

@            [log in to unmask]

 


Disclaimer

If you received this in error, please contact the sender and delete the
material from any computer.  

The information transmitted reflects the thoughts and opinions of the
author and does not represent the thoughts or opinions of SCHIN Ltd and
is intended only for the person or entity to which it is addressed and
may contain confidential and/or privileged material. Any review,
retransmission, dissemination or other use of, or taking of any action
in reliance upon, this information by persons or entities other than the
intended recipient is prohibited.

 

 Hi, can anyone point me in the direction of any best practice guidance,
codes of practice etc relating to methods of, and technicalities
involved in, data cleansing and de-duplication?

 

Thanks

 

Chris

 

http://www.rac.co.uk <http://www.rac.co.uk/> 

http://www.racbusiness.co.uk <http://www.racbusiness.co.uk/> 

http://www.bsm.co.uk <http://www.bsm.co.uk/> 

 

Any opinions expressed in this e-mail are those of the individual and
not necessarily the company. This e-mail and any attachments are
confidential to RAC and/or BSM and are solely for use by the intended
recipient.

 

If you are not the intended recipient you must not disclose, copy or
distribute its contents to any other person nor use its contents in any
way.

If you have received this e-mail in error please forward a copy of this
e-mail to "[log in to unmask]".

 

RAC Motoring Services: Registered England 1424399 VAT Reg No. GB
238640945 British School of Motoring: Registered England 291902 VAT Reg
No. GB 239505847 Registered Office(s): 1 Forest Road, Feltham, TW 13 7RR

 

This e-mail and any attachments has been scanned for the presence of
computer viruses. RAC/BSM accept no responsibility for computer viruses
once this e-mail has been transmitted.

 

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

       All archives of messages are stored permanently and are

      available to the world wide web community at large at

      http://www.jiscmail.ac.uk/lists/data-protection.html

      If you wish to leave this list please send the command

       leave data-protection to [log in to unmask]

            All user commands can be found at : -

        http://www.jiscmail.ac.uk/help/commandref.htm

  (all commands go to [log in to unmask] not the list please)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 

 

 

 

 


^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       All archives of messages are stored permanently and are
      available to the world wide web community at large at
      http://www.jiscmail.ac.uk/lists/data-protection.html
      If you wish to leave this list please send the command
       leave data-protection to [log in to unmask]
            All user commands can be found at : -
        http://www.jiscmail.ac.uk/help/commandref.htm
  (all commands go to [log in to unmask] not the list please)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^