Indeed, these errors are always associated with the /tmp directory. Although I would never have guessed the cause could have been so simple, the /tmp folder often has an overwhelming number of files within it (over 10k, but still well below the filesystem inode limit), so much so that the 'rm' binary will complain about 'too many parameters' when it is passed a wildcard ('rm -r *'). While we have other methods to delete the files it does indicate that the sheer number may be overwhelming some binaries, either fsl or system related. A large number of these however appear related to feat (e.g. /tmp/feat_zoPbjS). Some of these (the majority) are simply empty files, others are design files (*.fsf, *.con, *.mat, and associated png and ppm files). Although our /tmp directories are cleared each time we power cycle any of our nodes (however infrequently that may be), these files reappear which seems to indicate there could be something interfering with the garbage collection process.
If FSL programs are supposed to garbage collect any tmp files after they're finished running, do you know of any typical situation when they might not, particularly any such situation involving feats?
Thanks,
Bogdan Petre <[log in to unmask]>
Departments of Integrated Science and Physiology
Northwestern University
On 12/14/10 17:33, Mark Jenkinson wrote:[log in to unmask]" type="cite">Hi, This is quite serious - as I'm sure you realise. The warning message means that the file is corrupt and does not contain enough data. In fact they only seem to contain about half. Are all the files that this happens to in /tmp ? If it is only happening under conditions of high load, then maybe your /tmp is unable to deal with the traffic/space/number-of-files, or maybe the mechanism that fsl is using to select unique filenames inside of /tmp is sometimes giving clashes (although this function - mkstemp - is very fundamental and so it seems unlikely given that it doesn't happen to other users). So I would first check whether it is always associated with /tmp or not. If it is then that at least gives us something to go on. As for the missing /tmp directory - that is not surprising as our scripts and executables clean up any temporary directories in /tmp after they have run. I hope this will help you track down the problem. All the best, Mark On 14 Dec 2010, at 19:24, Bogdan Petre wrote:Hey Everyone, We've encountered issues reading *.nii.gz files which are not specific to any particular FSL script or dataset and have been produced by everything from melodic to fslmaths. An example error message is shown below: /usr/local/fsl/bin/fsl_motion_outliers /home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat/prefiltered_func_data 0 /home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat/motion_outliers.txt WARNING: nifti_read_buffer(/tmp/fsl_F4Bd5B_mc/fmri_mcf.nii.gz): data bytes needed = 990720 data bytes input = 439262 number missing = 551458 (set to 0) WARNING: nifti_read_buffer(/tmp/fsl_F4Bd5B_mc/fmri_mcf.nii.gz): data bytes needed = 990720 data bytes input = 439262 number missing = 551458 (set to 0) This issue has never occurred when running a single analysis, but occur with varying frequency when running multiple simultaneously. Melodic fails the most frequently (somethings >40% failure rate), and programs like fdt and fast were found to fail least often (~1% failure rate), while some programs have not failed at all (e.g. BET & SUSAN). The errors are not consistent, meaning that if after a script has failed it is rerun it won't necessarily fail again. However, running a sufficiently large batch of simultaneous jobs (e.g. submitting 100 to SGE) will invariably result in such errors. Nothing suspicious has been noted regarding the image files themselves either and header information obtained using fslhd together with fslstats -r -R output are listed below for prefiltered_func_data.nii.gz (which is the input file used above to produce the error output). fslhd: bogdan@bob:/home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat$ fslhd prefiltered_func_data.nii.gz filename prefiltered_func_data.nii.gz sizeof_hdr 348 data_type FLOAT32 dim0 4 dim1 86 dim2 72 dim3 40 dim4 276 dim5 1 dim6 1 dim7 1 vox_units mm time_units s datatype 16 nbyper 4 bitpix 32 pixdim0 0.0000000000 pixdim1 2.9767441750 pixdim2 2.9767441750 pixdim3 3.0000000000 pixdim4 2.5499999523 pixdim5 0.0000000000 pixdim6 0.0000000000 pixdim7 0.0000000000 vox_offset 352 cal_max 0.0000 cal_min 0.0000 scl_slope 0.000000 scl_inter 0.000000 phase_dim 0 freq_dim 0 slice_dim 0 slice_name Unknown slice_code 0 slice_start 0 slice_end 0 slice_duration 0.000000 time_offset 0.000000 intent Unknown intent_code 0 intent_name intent_p1 0.000000 intent_p2 0.000000 intent_p3 0.000000 qform_name Scanner Anat qform_code 1 qto_xyz:1 -2.976732 -0.000006 -0.008716 126.350861 qto_xyz:2 -0.000843 2.962765 0.290396 -194.549240 qto_xyz:3 -0.008607 -0.288146 2.985899 13.005946 qto_xyz:4 0.000000 0.000000 0.000000 1.000000 qform_xorient Right-to-Left qform_yorient Posterior-to-Anterior qform_zorient Inferior-to-Superior sform_name Scanner Anat sform_code 1 sto_xyz:1 -2.976731 0.000000 -0.008833 126.350861 sto_xyz:2 -0.000849 2.962765 0.290396 -194.549240 sto_xyz:3 -0.008724 -0.288146 2.985899 13.005946 sto_xyz:4 0.000000 0.000000 0.000000 1.000000 sform_xorient Right-to-Left sform_yorient Posterior-to-Anterior sform_zorient Inferior-to-Superior file_type NIFTI-1+ file_code 1 descrip FSL4.0 aux_file fslstats -r -R: bogdan@bob:/home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat$ fslstats prefiltered_func_data.nii.gz -r -R 16.176001 1493.583984 0.000000 2696.000000 After the script fails I looked for /tmp/fsl_F4Bd5B_mc/fmri_mcf.nii.gz but couldn't find it, in fact the directory /tmp/fsl_F4Bd5B_mc/ did not exist. We run fsl on an ubuntu linux cluster where all our nodes are diskless and mount their filesystems over NFS. A memory stress test was also conducted to ensure there were no hardware errors involved on the processing nodes and two independent servers (hosting raid 1 arrays) with identical configurations have be used as the master nodes for the cluster with these processing nodes to produce the same errors under similar conditions (i.e. running multiple simultaneous analyses), so the hardware is unlikely to be related to these errors. fslerrorreport was run in the directory containing the fsl_motion_outliers input. Below is the output: bogdan@bob:/home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat$ ^Crefiltered_func_data.nii.gz bogdan@bob:/home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat$ fslerrorreport /usr/local/fsl/bin/fslerrorreport: 92: quota: not found cat: /home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat/report.log: No such file or directory cat: /home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat/design.mat: No such file or directory cat: /home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat/design.con: No such file or directory cat: /home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat/design.fts: No such file or directory ls: cannot access /home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat/stats: No such file or directory ###################################################################### #### MACHINE INFORMATION ###################################################################### Uname: Linux df: Filesystem 1K-blocks Used Available Use% Mounted on 165.124.111.159:/home 6341360640 5295659264 723578880 88% /home quota: Memory and Swap info: MemTotal: 16459800 kB MemFree: 5555752 kB Buffers: 0 kB Cached: 10029180 kB SwapCached: 0 kB Active: 4490240 kB Inactive: 5891500 kB Active(anon): 14508 kB Inactive(anon): 343900 kB Active(file): 4475732 kB Inactive(file): 5547600 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 4 kB Writeback: 56 kB AnonPages: 352536 kB Mapped: 53980 kB Slab: 144928 kB SReclaimable: 111732 kB SUnreclaim: 33196 kB PageTables: 24448 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 8229900 kB Committed_AS: 1026412 kB VmallocTotal: 34359738367 kB VmallocUsed: 399000 kB VmallocChunk: 34359338875 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 7680 kB DirectMap2M: 16760832 kB ###################################################################### #### ENVIRONMENT INFORMATION ###################################################################### FSLMACHTYPE=gnu_64-gcc4.4 FSLDIR=/usr/local/fsl FSLTCLSH=/usr/local/fsl/bin/fsltclsh FSLMULTIFILEQUIT=TRUE FSLMACHINELIST= FSLOUTPUTTYPE=NIFTI_GZ FSLWISH=/usr/local/fsl/bin/fslwish FSLREMOTECALL= FSLCONFDIR=/usr/local/fsl/config FSLLOCKDIR= MATLABPATH=/usr/local/matlab_scripts:/usr/local/spm5:/usr/local/vbmtools PATH=/home/sge/bin/lx24-amd64:/usr/local/freesurfer/bin:/usr/local/caret/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/home/sge/bin/lx24-amd64/:/usr/local/fsl/bin MANPATH=/home/sge/man:/usr/share/man:/usr/local/share/man ###################################################################### #### DIRECTORY INFORMATION ###################################################################### PWD: /home/lejian/Gamble_preprocessing/Results/funct_test/con002/Scan2/gamb1/gamb1.feat total 384400 drwxr-xr-x 15 lejian ava 4096 Dec 14 12:31 . drwxr-xr-x 6 lejian ava 4096 Nov 29 13:12 .. -rw-r--r-- 1 lejian ava 25348 Nov 30 10:08 confound_MWV.txt -rw-r--r-- 1 lejian ava 28074 Nov 30 10:08 confound_MWVG.txt -rw-r--r-- 1 lejian ava 357635 Dec 14 12:31 example_func.nii.gz -rw-r--r-- 1 lejian ava 46665793 Nov 30 10:25 filtered_MWV.nii.gz -rw-r--r-- 1 lejian ava 46664822 Nov 30 10:09 filtered_MWVG.nii.gz -rw-r--r-- 1 lejian ava 46663373 Nov 30 10:25 filtered_MWVG_ICA_outliner.nii.gz -rw-r--r-- 1 lejian ava 46661990 Nov 30 10:41 filtered_MWV_ICA_outliner.nii.gz -rw-r--r-- 1 lejian ava 60792601 Nov 30 10:03 filtered_func_data.nii.gz -rw-r--r-- 1 lejian ava 46340819 Nov 30 10:04 filtered_func_no_edge_data.nii.gz -rw-r--r-- 1 lejian ava 6639 Nov 30 10:04 filtered_func_no_edge_data_mask.nii.gz -rw-r--r-- 1 lejian ava 3997 Nov 30 09:54 mask.nii.gz drwxr-xr-x 3 lejian ava 4096 Nov 30 09:47 mc -rw-r--r-- 1 lejian ava 220426 Nov 30 10:03 mean_func.nii.gz -rw-r--r-- 1 lejian ava 98591857 Dec 14 12:22 prefiltered_func_data.nii.gz drwxr-x--- 2 lejian ava 12288 Nov 23 21:21 prefiltered_func_data_mcf.mat drwxr-x--- 2 lejian ava 12288 Nov 24 09:13 prefiltered_func_data_mcf.mat+ drwxr-x--- 2 lejian ava 12288 Nov 28 13:41 prefiltered_func_data_mcf.mat++ drwxr-x--- 2 lejian ava 12288 Nov 28 15:08 prefiltered_func_data_mcf.mat+++ drwxr-x--- 2 lejian ava 12288 Nov 28 17:00 prefiltered_func_data_mcf.mat++++ drwxr-x--- 2 lejian ava 12288 Nov 28 17:49 prefiltered_func_data_mcf.mat+++++ drwxr-x--- 2 lejian ava 12288 Nov 28 18:11 prefiltered_func_data_mcf.mat++++++ drwxr-x--- 2 lejian ava 12288 Nov 29 14:15 prefiltered_func_data_mcf.mat+++++++ drwxr-x--- 2 lejian ava 12288 Nov 30 09:46 prefiltered_func_data_mcf.mat++++++++ drwxr-xr-x 2 lejian ava 4096 Nov 30 09:44 reg drwxr-xr-x 2 lejian ava 4096 Nov 30 10:08 roi drwxr-xr-x 2 lejian ava 4096 Nov 30 10:08 seg ###################################################################### #### FEAT INFORMATION ###################################################################### Report log: ###################################################################### Design Matrix: ###################################################################### Contrast Matrix: ###################################################################### FTS Matrix: ###################################################################### #### Main directory: total 384400 drwxr-xr-x 15 lejian ava 4096 Dec 14 12:31 . drwxr-xr-x 6 lejian ava 4096 Nov 29 13:12 .. -rw-r--r-- 1 lejian ava 25348 Nov 30 10:08 confound_MWV.txt -rw-r--r-- 1 lejian ava 28074 Nov 30 10:08 confound_MWVG.txt -rw-r--r-- 1 lejian ava 357635 Dec 14 12:31 example_func.nii.gz -rw-r--r-- 1 lejian ava 46665793 Nov 30 10:25 filtered_MWV.nii.gz -rw-r--r-- 1 lejian ava 46664822 Nov 30 10:09 filtered_MWVG.nii.gz -rw-r--r-- 1 lejian ava 46663373 Nov 30 10:25 filtered_MWVG_ICA_outliner.nii.gz -rw-r--r-- 1 lejian ava 46661990 Nov 30 10:41 filtered_MWV_ICA_outliner.nii.gz -rw-r--r-- 1 lejian ava 60792601 Nov 30 10:03 filtered_func_data.nii.gz -rw-r--r-- 1 lejian ava 46340819 Nov 30 10:04 filtered_func_no_edge_data.nii.gz -rw-r--r-- 1 lejian ava 6639 Nov 30 10:04 filtered_func_no_edge_data_mask.nii.gz -rw-r--r-- 1 lejian ava 3997 Nov 30 09:54 mask.nii.gz drwxr-xr-x 3 lejian ava 4096 Nov 30 09:47 mc -rw-r--r-- 1 lejian ava 220426 Nov 30 10:03 mean_func.nii.gz -rw-r--r-- 1 lejian ava 98591857 Dec 14 12:22 prefiltered_func_data.nii.gz drwxr-x--- 2 lejian ava 12288 Nov 23 21:21 prefiltered_func_data_mcf.mat drwxr-x--- 2 lejian ava 12288 Nov 24 09:13 prefiltered_func_data_mcf.mat+ drwxr-x--- 2 lejian ava 12288 Nov 28 13:41 prefiltered_func_data_mcf.mat++ drwxr-x--- 2 lejian ava 12288 Nov 28 15:08 prefiltered_func_data_mcf.mat+++ drwxr-x--- 2 lejian ava 12288 Nov 28 17:00 prefiltered_func_data_mcf.mat++++ drwxr-x--- 2 lejian ava 12288 Nov 28 17:49 prefiltered_func_data_mcf.mat+++++ drwxr-x--- 2 lejian ava 12288 Nov 28 18:11 prefiltered_func_data_mcf.mat++++++ drwxr-x--- 2 lejian ava 12288 Nov 29 14:15 prefiltered_func_data_mcf.mat+++++++ drwxr-x--- 2 lejian ava 12288 Nov 30 09:46 prefiltered_func_data_mcf.mat++++++++ drwxr-xr-x 2 lejian ava 4096 Nov 30 09:44 reg drwxr-xr-x 2 lejian ava 4096 Nov 30 10:08 roi drwxr-xr-x 2 lejian ava 4096 Nov 30 10:08 seg ###################################################################### #### Stats directory: ###################################################################### #### Reg directory: total 33424 drwxr-xr-x 2 lejian ava 4096 Nov 30 09:44 . drwxr-xr-x 15 lejian ava 4096 Dec 14 12:31 .. -rw-r--r-- 1 lejian ava 138 Nov 30 09:40 example_func2highres.mat -rw-r--r-- 1 lejian ava 22971814 Nov 30 09:41 example_func2highres.nii.gz -rw-r--r-- 1 lejian ava 1519639 Nov 30 09:43 example_func2highres.png -rw-r--r-- 1 lejian ava 808985 Nov 30 09:41 example_func2highres1.png -rw-r--r-- 1 lejian ava 710605 Nov 30 09:43 example_func2highres2.png -rw-r--r-- 1 lejian ava 133 Nov 30 09:44 example_func2standard.mat -rw-r--r-- 1 lejian ava 2456896 Nov 30 09:44 example_func2standard.nii.gz -rw-r--r-- 1 lejian ava 477763 Nov 30 09:44 example_func2standard.png -rw-r--r-- 1 lejian ava 269418 Nov 30 09:44 example_func2standard1.png -rw-r--r-- 1 lejian ava 207975 Nov 30 09:44 example_func2standard2.png -rw-r--r-- 1 lejian ava 2382368 Nov 30 09:40 highres.nii.gz -rw-r--r-- 1 lejian ava 144 Nov 30 09:41 highres2example_func.mat -rw-r--r-- 1 lejian ava 142 Nov 30 09:44 highres2standard.mat -rw-r--r-- 1 lejian ava 964872 Nov 30 09:44 highres2standard.nii.gz -rw-r--r-- 1 lejian ava 448252 Nov 30 09:44 highres2standard.png -rw-r--r-- 1 lejian ava 226162 Nov 30 09:44 highres2standard1.png -rw-r--r-- 1 lejian ava 222143 Nov 30 09:44 highres2standard2.png -rw-r--r-- 1 lejian ava 414189 Nov 30 09:40 standard.nii.gz -rw-r--r-- 1 lejian ava 143 Nov 30 09:44 standard2example_func.mat -rw-r--r-- 1 lejian ava 143 Nov 30 09:44 standard2highres.mat This error report is saved in the file: /tmp/fsl_m6UaYm.gz I was wondering if anybody here had ever experienced anything of this nature or might have any ideas regarding the cause? A copy of the entire directory which contains the input files listed above used can be found here: http://apkarianlab.northwestern.edu/gamb1.feat.tar.gz Thanks in advance for any help, Bogdan Petre Northwestern University [log in to unmask]
--Bogdan Petre <[log in to unmask]>
Departments of Integrated Science and Physiology
Northwestern University