It's always nice to see a practical programmer. In fact, I use this method to read a large survey file with multiple record types. There's Household, family level and person level data. I have a program that reads the raw data which has one record per person that is fixed in size and which repeats the relevant household and family data on each person record. The file also contains longitudinal data up to 48 months. The file size is over 1 GB. The programs that use this data is a micro simulation model which can create new variables on the file. I keep track of the variables on the files by creating an ASCII text file with variable names and sizes on the file. I wanted to create read statements that would read this data quickly. So I ended up breaking the data into three different arrays, one each for Household data which occurs 1 time, family records which can occur multiple times for each household and person records which can occur multiple times per household. In the end, I wrote my program so that the file was written out as a binary file, with a few variables at the beginning which defined the dimensions for each of my 3 arrays. The household array is one dimension, the family and person arrays are 3 dimensional with the number of families, by the number of month and finally by the number of variables in the family array. The person array is the same but dimensioned by persons instead of families. So before I read the arrays, I have enough information to allocate arrays of exactly the size I need. So all I need to do is read the allocated arrays in a read statement that looks like: READ(8,END=NNNN) HHLD, FAMILY, PERSON. It's an extremely fast read. Of course I use equivalences so I don't need to create 6 arrays. Now the powers that be don't seem to like equivalences, but it's not hard to keep track of the few real variables. I read the ASCII data dictionary in once at the beginning, this file tells me the location in the variable dimension. This way I don't need to have the input variables in a fixed location since I can lookup the location in my data dictionary once. I hope this gives you some idea of the flexibility of writing out entire arrays and associating a data dictionary with the data. From: Jing Guo <[log in to unmask]> At 05:00 PM 7/10/01 -0400, you wrote: > > > > It's maybe a simple question. > >It may be a simple question, but there is no simple answer. > >If I were your programmer given this assignment, I would first ask you >for the specifications of the file format and the data to be extracted >from the file. I would say that the size of the file is not the same >thing as the size of the data. There may be and may be not a simple >relationship between them. > >Assume the data in the file were written out as a flat data stream. In >this case, one may be able to link the file size to the data size in a >simple expression. Even so, one won't be able to tell if the data is a >simple 1-d array for a row or a column vector or a 2-d array for a >matrix. All that information, if not specified in the file, would have >to be specified somewhere else (Even the software engineers of MATLAB >would have to either store dimensional information in a data file or ask >users to specify them). > >Now it may be geting more controvertial. > >For your specific problem, I would first suggest a long term solution, >by redefine your output file format, such that all dimensional >information are the _first_ piece of data one can get from a file (why >does anyone want to read a file twice?). It is simply because losing >information is not a reversiable adiabatic process. It is not always >possible to reverse engineering data dimensional information from file >sizes. > >If you have no control over the output data format, and the file size >data can be easily mapped to data dimensional information, why don't you >collect the file size data _before_ you use them? For instance, one can >create a list of files with their sizes listed next to their names, such >as: > >13579 "file1.dat" >2468 "file2.dat" >999 "file3.dat" > >I won't suggest some "smart" solutions in Fortran. In my programming >life, I tried to program some "smart" solutions myself. I now believe >all those solutions were wrong. If the data integraty is broken, the >best solution should be to patch the data, not to create some generic >solutions that in the best require some very case-specific information >to work. > > > > Suppose you have several data files, each > > > of them have different length and will be read in by your program. Usually > > you have to specify the corresponding array length in your program > > otherwise you would run in trouble in read. I want to know if there is a > > mean in Fortran to read data file without specify the size, something like > > that in MATLAB: > > > > r1=myfile(:,1) > > > > or > > > > fscanf(fid,'%g %g',[2 inf]); > > > > Thus, all the data will be read in and we can get the size of array by > > simple command size(...). > > > > Many thanks. > > > > Yongcheng > > > > >-- >________________________________ _-__-_-_ _-___--- >Jing Guo, [log in to unmask], (301)614-6172(o), (301)614-6297(fx) >Data Assimilation Office, Code 910.3, NASA/GSFC, Greenbelt, MD 20771 Bob Cohen (703) 534-7618 [log in to unmask]