1. Offering text standardization function to help with (1) data cleaning, (2) reducing volume of text information, (3) merging data from different sources or having different character sets
2. Ability to categorize information (text in particular, and tagged data more generally), using built-in or ad hoc taxonomies (with a customized number of categories and subcategories), together with clustering algorithms. A data record can be assigned to multiple categories.
3. Ability to efficiently store images, books, records with high variance in length, possibly though an auxiliary file management system and data compression algorithms, accessible from the database.
4. Ability to process data remotely on your local machine, or externally, especially computer-intensive computations performed on a small number of records. Also, optimizing use of cache systems for faster retrieval.
Read full discussion at http://bit.ly/1grodYg
You may leave the list at any time by sending the command
SIGNOFF allstat
to [log in to unmask], leaving the subject line blank.
|