Dear all,

this thread has just served to highlight one of the key issues in archaeology: There is a mixed level of understanding of the nature of data and databases. Database applications such as Access are really handy, but can also generate months of extra work if used inappropriately. 

The key thing to remember is to choose your tool to suit the job. For simple analysis of a simple dataset, a flat-file structure (single table) in a spreadsheet or stats package will generally suffice. For analysis of more complex datasets it is often essential, as Andrew Larcombe states, to use a more structured approach, something that excel, etc cannot handle at all. As Jenny Vaughan states, Access can be used to work with unstructured data stored in single tables and provides a number of tools not found in Excel, but then Excel is much better for moving blocks of data within tables as it has no concept of records simply rows and columns. Basically, it depends on what you're doing as to the way forward, but understanding the nature your data is key. 

The problem is that to get a structured (relational) database to work, you need to have at least a  basic understanding of data and data modelling techniques. Without this, the structure will generally not represent the data, but which MS Access wizards the user can use or how they like they forms to appear. This then results in a poorly designed system which cannot do what was intended in the first place, let alone provide a re-usable source of data. 

Jenny's suggestion to replicate a paper form is a good starting point, especially where only a single form is in use. It gets a bit more complex for multi-form datasets as there are fields on the form that are used to relate forms to each other or summarise other bits of data: Relationships should be stored as such and summaries should be dynamically generated from the data (ie total weight of pot by context should not be a field on the pottery table or the context table but a query definition based on the raw data tables).

Validation is a reason for using a database application such as Access over a spreadsheet or similar: If you want to control data entry using glossaries or the like, then a relational database is the way forward. Entering data into Excel then trying to import is fine as long as the users enter consistently valid data. 

And remember, principles are platform independent. Things that could be automated in dBase can certainly be automated using Access. Access is potentially the most flexible desktop database application on the market, with its support for Visual Basic, ActiveX objects and the like. Calling Access clunky, therefore, is unfair - a particular Access based application may be clunky due to poor design (I've seen plenty!!), but that's not the fault of the development environment, rather the developer. If you really want to criticise Access, pick on it's rather flaky multi-user support... (Again, right tool for the job - distributed system, choose of the big boys like SQL Server of Oracle).

Having said all that, I'd just like to add IT'S NOT THAT HARD OR COMPICATED!!! Most people, in my experience, do understand data issues once the basic concepts come to their attention. Successful data handling is all about modelling aspects of the real-world, something archaeologists do every day, which should make it easier, especially with the object/behaviour models of data which are becoming popular.

Paul's few basic database tips:
1. Keep your data atomic - if you can split a piece of data up into two bits of data (eg a phone number becomes an area code and a number) then it should be stored as two pieces of data not one. Remember, it's far easier to concatenate!

2. Use appropriate data types - use numerics where calculations are needed as you cannot perform calculations on strings. And don't use a string for a measurement field just so you can enter "1mm" or "0.5cm" - make a decision as to units of measurement and stick with it and keep numbers and strings separate! 

3. Try to build a data model that represents how the data actually is and enforce meaningful relationships (eg for context recording, use a Cuts table and a Deposits table to store data for cut and deposit type contexts and then relate them using their physical or stratigraphic relationships so we can say eg Cuts are filled with Deposits). 

3. Use look-up tables to control data entry wherever possible.

4. Try to avoid free-text fields apart from to support other fields. It is much easier to work with categorical data. For example, storing finds data in such a field (such as the ubiquitous 'comments' field)  makes the data virtually useless when it comes to plotting finds distributions.

5. Beware of memo objects. They are supported in varying ways by different applications, not always well (eg ArcView3.x).

Paul Cripps
GIS Specialist
English Heritage
Centre for Archaeology
Fort Cumberland
Fort Cumberland Road
Portsmouth, PO4 9LD
Tel: 02392 856765
Fax: 02392 856701
This e-mail is intended solely for the above-mentioned recipient
and it may contain confidential or privileged information. If you
have received it in error, please notify us immediately and delete
the e-mail. You must not copy, distribute, disclose or take any
action in reliance on it.

English Heritage
Telephone 020 7973 3000
Facsimile 020 7973 3001