Hi Zhijie

I checked the versions of todos & tounix that I used & it turns out that the version of 'tounix' uses an even dumber method than the 'dumb' method you describe.  It does the equivalent of:

perl -pe 's/\r//' original+todos.mtz > original+todos+tounix.mtz


This should normally be OK for a DOS text file since it should not contain any \r except in the combination \r\n.  However for the MTZ file it would be disastrous since there are likely to be many lone \r's in the binary data.  In that case the code above removes the first \r on a line, instead of the last one just before the \n.

I tried it with your method and I got perfect recovery of the original MTZ file, as you observed.

Cheers

-- Ian

On 6 March 2015 at 17:28, Zhijie Li <[log in to unmask]> wrote:
Hi Ian,
 
Unix/Linux uses a single byte 0A (the linefeed character) as the line end marker for text, while DOS/Win use two bytes 0D 0A (carriage return + line feed). Old Mac systems use 0D. (what a mess!)
 
So a simple UNIX to DOS operation should expand every 0A in a file to 0D 0A:
perl -pe 's/\n/\r\n/' input.mtz > output.mtz
 
For MTZ file, since the main body of it is an array of REAL*4, apparently this operation will generate a lot of ‘insertional mutations’. No wonder the data is scrambled and the header cannot be found (its position is shifted, no longer at where the pointer points).
 
During the conversion from UNIX to DOS, there is a complication: there could be a few 0D 0A already in the original file. Smarter unix2dos programs will keep these 0D 0A unchanged so that there won’t be weird-looking 0D 0D 0A (for example http://www.thefreecountry.com/tofrodos/ states this in its source file and has this behavior). Some dumber methods will simply convert all of them into 0D 0D 0A, for example:
perl -pe 's/\n/\r\n/' input.mtz > output.mtz
 
 
When converting from the DOS format back to UNIX format, in our binary file case, the dumber method will work but the smarter programs will cause problems. Because for all the 0D 0A in the DOS file generated by the smarter programs, there is no way to tell which used to be 0A and which used to be 0D 0A in the original, and they are all converted back to 0A. Therefore some 0D in the original file will be lost. With the dumber method, all 0D 0A were 0A in the original file, so there would be no problem changing them all back to 0A.
 
 
In a given MTZ, there will almost certainly be a lot of 0A. But 0D 0A could be rare or non-existent. So after a unix-dos then dos-unix conversion, the result depends on how many 0D 0A were there in the original file, and how the program did the UNIX-DOS conversion.
 
With things like the following, it should be OK:
perl -pe 's/\n/\r\n/' test.mtz > test1.mtz   
perl -pe 's/\r\n/\n/' test1.mtz > test2.mtz
 
To test the dumber (perl) method, I used an MTZ file, which contains 7 0D 0A in the data section. Here is the result:
test.mtz 2030184 bytes    : MTZdump OK
test1.mtz 2032486 bytes  : MTZdump error
test2.mtz 2030184 bytes  : MTZdump OK
cmp test.mtz test2.mtz     : the two files are identical
 
test.mtz 2030184 bytes : MTZdump OK
todos.mtz 2032479 bytes : MTZdump error (note the size difference compared to test1.mtz, the 7 0D 0A in the original file were kept unchanged)
fromdos.mtz 2030177 bytes : MTZdump error (7 original 0D 0A were shrinked to 0A)
 
 
Ian, in your test with todos and tounix, it seems that the final MTZ still has a header information at the correct location, so that MTZdump could read it. But some of the numbers saved in the data array seem damaged, so some of the stats in MTZdump were out of range. It would be interesting to read the todos and tounix source code to see why that happened. Or with a binary file comparison tool we might be able to guess the cause by taking a look at the two files.
 
Zhijie
 
 
 
 
 
 
From: [log in to unmask]" href="mailto:[log in to unmask]" target="_blank">Ian Tickle
Sent: Friday, March 06, 2015 5:59 AM
To: [log in to unmask]" href="mailto:[log in to unmask]" target="_blank">[log in to unmask]
Subject: Re: [ccp4bb] how to recover my data
 
Hi, just for fun and to demonstrate what can go horribly wrong if you blindly use utilities that were specifically designed for changing the line terminators in an ASCII file, I applied these utilities to an MTZ file.

First I used 'todos' to simulate what I suspect the technician has done:

todos < original.mtz > original+todos.mtz

Then I used 'tounix' in an attempt to recover the original file, as others have suggested:

tounix < original+todos.mtz > original+todos+tounix.mtz

What can possibly go wrong?  The log files from mtzdump are attached - see for yourself! (the mtzdump on original+todos.mtz went into an infinite loop and I had to kill the process).

Cheers

-- Ian


 
On 6 March 2015 at 07:20, Smith Lee <[log in to unmask]> wrote:
Dear All,
 
For the issue of the recovery of the mtz file, I have tried randomly to use excel to open one specific mtz file, however in this way all  the mtz files in the computer will have a excel icon (X), although the file extension is still .mtz. If I tried further to open one specific mtz file (with excel icon) with the notepad, all the mtz files will have the notepad icon. If I tried further to open one specific mtz file (with notepad icon) with the wordpad, all the mtz files will have the wordpad icon.
 
I hope these cluses can be helpful for you to give me the advise on recovery of mtz files.
 
Smith


On Thursday, March 5, 2015 11:51 PM, Robbie Joosten <[log in to unmask]> wrote:


Hi Smith,

If this is really the problem Ian describes, you can try the Linux programs unix2dos and dos2unix the change the line endings. A potential source of the problem might be copying the file with certain (S)FTP clients: in 'text-mode' they change the line endings to your OS default to be user friendly.

Cheers,
Robbie

> -----Original Message-----
> From: CCP4 bulletin board [mailto:[log in to unmask]] On Behalf Of
> Ian Tickle
> Sent: Thursday, March 05, 2015 14:04
> To: [log in to unmask]
> Subject: Re: [ccp4bb] how to recover my data
>
> Hi Smith
>
>
> I sympathise with your plight - I have had to do similar things in the past for
> other people!  I think your most fruitful course of action would be to talk to
> the technician who recovered your data because only he knows what he
> actually did to recover it.
>
>
> From your description of your recovery of the PDB file it looks to me like a
> line terminator issue, i.e. was the original file created in Linux, Windows or
> Mac?  This is relevant because the line terminators are different and it
> sounds like the technician didn't simply copy the file, he changed the line
> terminators.  If he did the same with the MTZ file thinking it was a text file
> the additional line terminators would corrupt the binary data making it
> impossible to read with any of the CCP4 MTZ utilities.  If you can understand
> exactly what the technician did you may be able to reverse it and recover the
> binary data.
>
>
> Hope this helps!
>
>
> Cheers
>
>
> -- Ian
>
>
> On 5 March 2015 at 05:36, Smith Lee <00000459ef8548d5-dmarc-
> [log in to unmask]> wrote:
>
>
>
>     Dear All,
>
>     Recently my computer hardware has been broken and all the data
> has been recovered to movable hardware by technician. However I find the
> recovered PDB file and the MTZcould not be openned by Coot. Then I open
> the revovered PDB file by WordPad, and from WordPad I copied it to
> notepad and save it as pdb file. I find the Coot can open the notepad saved
> pdb file, thus my pdb files can be succesfully recovered from the hardware.
>
>     But will you please tell me how to have Coot open my mtz file? After
> data recovery by the technicial, the data size of the mtz file did not decrease,
> thus I think there is a way to have it recovered.
>
>     I have not noticed there were similar or identical posts as mine for
> recovery data before in the CCP4 mail list.
>
>     Thus I am looking forward to getting a reply from you on how to
> recover my mtz file.
>
>
>     Smith
>
>