OK, good news and bad news. I seem to have worked around the file open
problem only to hit another wall.
Analysis itself seems to have no problem with the large file; however, the
getNoiseEstimate subroutine really doesn't like it. Anything I tried to do
to that routine caused all sorts of problems - I'm not exactly a Python
guru. However, simply bypassing the call to it "fixed" the problem and let
me open the file and adjust contours manually. Here's the fairly trivial
diff on Util.py:
afowler% diff Util.py Util.py.orig
1273c1273
< v = 60.0 / spectrum.root.currentAnalysisProject.globalContourScale
---
> v = 3 * getNoiseEstimate(spectrum) /
spectrum.root.currentAnalysisProject.globalContourScale
The file opened very quickly with no errors. 60.0 as a guess was neither
particularly good or bad. I managed to find a reasonable contour level, etc.
The new problem is that for any file this size, I MUCH prefer to work with
contour files as opposed to calculating on the fly. Calculation is slow, as
expected for a 2 gig file. However, it got what I'd guess is ~1/4 of the way
through and tossed an dialog with a save error, something about xextended
and yextended being bad. I think this is almost certainly a 32 bit error.
If there's a not-too-ugly way to get the contours program to deal with this,
great. In the meantime, based on another suggestion, I will reprocess the
data and split it into two separate datasets, one for the aromatic region
which I can "correct" referencing for the aliased 13C dimension, and one for
the aliphatic region.
Thanks to Wayne for all the help,
Andrew
On 1/22/09 3:13 PM, "Wayne Boucher" <[log in to unmask]> wrote:
> One workaround I thought of if this is a 32-bit problem. You can split
> the file in two (using the imaginatively named command 'split'). Well,
> you'd have to hit the block boundaries or that would be trouble. Then
> load each one, but you'd have to edit the two par files by hand to get the
> referencing right. So this is not a very elegant solution (and prone to
> getting arithmetic wrong).
>
> Wayne
>
> On Thu, 22 Jan 2009, Andrew Fowler wrote:
>
>> Thanks again, Wayne. Hadn't thought of the 32-bit problem, but that could be
>> it. This is the largest dataset I've got, and the only one I've ever tried
>> to get into Analysis over 1.4 gigs. If you get a chance to try and import a
>> really large file, please let us all know what happens. I'm running OS X
>> (10.4.11), FWIW, but that really shouldn't be much different than Linux.
>>
>> I'll take a look at Util.py and getNoiseEstimate to see if I can work around
>> this one time. Otherwise, looks like I'll need to reprocess the dataset to
>> be smaller.
>>
>> I'll report back if I get this to work at all.
>>
>> Cheers,
>> Andrew
>>
>> On 1/22/09 2:52 PM, "Wayne Boucher" <[log in to unmask]> wrote:
>>
>>> Hmmm, 2.7 Gb sounds like we might have a 32-bit problem (in the C world,
>>> not in the Python world). So it's possibly it's calculating some offset
>>> as negative rather than positive, so getting upset. That's my guess,
>>> anyway. If that's the case then it's likely to fail at the contouring (at
>>> least in certain regions) as well, even if it gets past the noise stage.
>>> Are you on Linux? I'll try and produce a suitable sized data set to try
>>> this.
>>>
>>> You can change the block size during conversion, so:
>>>
>>> pipe2azara pipeInFile azaraOutFile blockSize
>>>
>>> That would probably help with the speed if that was the only issue.
>>>
>>> The noise estimate is used in the function defaultContourLevels() in
>>> ccpnmr1.0/python/ccpnmr/analysis/Util.py. There's a line there that
>>> starts with:
>>>
>>> v = 3 * getNoiseEstimate...
>>>
>>> If you have a different way of doing it then that is the spot. This
>>> function is only called when the spectrum is first loaded. So you could
>>> edit this function for that one spectrum, load it, then edit it back. But
>>> like I said, there is probably a deeper problem here.
>>>
>>> Wayne
>>>
>>> On Thu, 22 Jan 2009, Andrew Fowler wrote:
>>>
>>>> Hi Wayne,
>>>>
>>>> Thanks for the input. The final matrix, having extracted the bits that
>>>> contain data in the 1H dimensions, is 1616 x 809 x 512 points (block sizes
>>>> of 32 x 16 x 8 according to the .spc.par file for what I'm guessing is a
>>>> final block size of 4k). I've been converting from Pipe to Azara format for
>>>> all spectra going into Analysis for some time now and only tried importing
>>>> this as Pipe after getting the error several times, which seems to be
>>>> independent of data format.
>>>>
>>>> The final dataset in either format is ~2.7 gigs, as you say much bigger. An
>>>> HMQC-NOESY dataset (13C for the other proton) that's half the size in the
>>>> indirect proton for a total of ~1.4 gig opened fine.
>>>>
>>>> Slow isn't a problem, but not getting the file open is. I just tried
>>>> reconverting to Azara format (letting pipe2azara figure out the block size)
>>>> and have the same result. You have confirmed what I figured that it's
>>>> taking
>>>> a random sample to calculate noise for the initial contour threshold and
>>>> dying on that for some reason.
>>>>
>>>> Is it possible to either change the block size during conversion (i.e. work
>>>> in 16k blocks) or not have Analysis try to calculate noise - maybe force it
>>>> to initially display at a very high contour level and then drop it down
>>>> manually?
>>>>
>>>> Any suggestions greatly appreciated.
>>>>
>>>> Thanks again,
>>>> Andrew
>>>>
>>>>
>>>> On 1/22/09 1:45 PM, "Wayne Boucher" <[log in to unmask]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> This could be a couple of things. First of all, I should say that that
>>>>> bit of the code with the error message is all about doing a random sample
>>>>> of the data to try and estimate the noise level of the spectrum. Since
>>>>> it's random it means that if something is going wrong then the exact point
>>>>> is likely to differ from attempt to attempt.
>>>>>
>>>>> That point (201 390 493) is pretty big. So just that number implies a
>>>>> data file that should be at least 150Mb in size (and probably much
>>>>> bigger). So has the data file been truncated perhaps?? We should
>>>>> probably check the data file size against the number of points so we don't
>>>>> get this obscure error.
>>>>>
>>>>> And I should say that for a large 3D data set like this, NmrPipe format
>>>>> will normally not be a good one to use because (at least the examples I've
>>>>> seen) it is not blocked, which makes access pretty slow (that might be
>>>>> what you are observing), especially in the dimensions other than the
>>>>> first.
>>>>>
>>>>> Wayne
>>>>>
>>>>> On Thu, 22 Jan 2009, Andrew Fowler wrote:
>>>>>
>>>>>> I'm trying - unsuccessfully - to import a NOESY-13C-HMQC dataset into an
>>>>>> existing project. The data was processed and looks fine in NMRPipe. I am
>>>>>> using a fully updated version of Analysis v1. I haven't taken the plunge
>>>>>> to
>>>>>> v2 yet.
>>>>>>
>>>>>> This experiment just does not want to import. I've tried it both as
>>>>>> NMRPipe
>>>>>> data and converted into Azara format. I select the file in the "open
>>>>>> spectrum" dialog, all goes well until what is normally the end where I
>>>>>> set
>>>>>> the experiment type to H_H[C].NOESY (I've also tried H[C]_H.NOESY). At
>>>>>> this
>>>>>> point I close the dialog which usually "freezes" for a short time while
>>>>>> it
>>>>>> loads and displays the data. In this case, however, the dialog
>>>>>> immediately
>>>>>> goes away, no spectrum appears, and I get the following traceback:
>>>>>>
>>>>>>>>> Exception in Tkinter callback
>>>>>> Traceback (most recent call last):
>>>>>> File "/sw/lib/python2.4/lib-tk/Tkinter.py", line 1345, in __call__
>>>>>> return self.func(*args)
>>>>>> File
>>>>>>
"/usr/local/ccpnmr/ccpnmr1.0/python/ccpnmr/analysis/OpenSpectrumPopup.py">>>>>>
,
>>>>>> line 227, in openSpectra
>>>>>> self.parent.finishInitSpectrum(spectrum)
>>>>>> File "/usr/local/ccpnmr/ccpnmr1.0/python/ccpnmr/analysis/Analysis.py",
>>>>>> line 1351, in finishInitSpectrum
>>>>>> self.initBlockFile(spectrum)
>>>>>> File "/usr/local/ccpnmr/ccpnmr1.0/python/ccpnmr/analysis/Analysis.py",
>>>>>> line 1188, in initBlockFile
>>>>>> Util.defaultContourLevels(spectrum)
>>>>>> File "/usr/local/ccpnmr/ccpnmr1.0/python/ccpnmr/analysis/Util.py", line
>>>>>> 1273, in defaultContourLevels
>>>>>> v = 3 * getNoiseEstimate(spectrum) /
>>>>>> spectrum.root.currentAnalysisProject.globalContourScale
>>>>>> File
>>>>>> "/usr/local/ccpnmr/ccpnmr1.0/python/ccpnmr/analysis/ExperimentBasic.py",
>>>>>> line 517, in getNoiseEstimate
>>>>>> d = block_file.getValue(pt)
>>>>>> BlockFile.error: could not get point: 201 390 493
>>>>>>
>>>>>> Any ideas? The point numbers in the final line change with different
>>>>>> attempts but the rest of the error is consistent. Final note is that this
>>>>>> will be experiment #24 in the project.
>>>>>>
>>>>>> Thanks,
>>>>>> Andrew
--
Dr. Andrew Fowler | University of Iowa
Associate Director | B291 Carver Biomedical Research Building
Medical NMR Facility | Iowa City, IA 52242
319-384-2937 (office) | 319-335-7273 (fax)
[log in to unmask]
|