Hi Ming.
On Tue, 19 Sep 2023, Ming Yang wrote:
> May I ask that, is there any ways to speed up the visualization of a catalog? For example, I have to wait for quite some time to see a color-magnitude diagram of a catalog with 22m targets. Many thanks!~
... probably.
TL;DR: converting your input data to colfits is most likely going to help
(http://www.starlink.ac.uk/stilts/sun256/inColfits.html).
But for more detail, read on.
For reference, using a 22 Mrow 7-column FITS file stored on a
spinning disk on my 20-core machine, an initial 3-column
default scatter plot takes around 3 seconds, and subsequent frames
(pan/zoom with Sketch Frames option turned off) about 90ms.
If I restrict the plot processing to single-core operation
(topcat -Djava.util.concurrent.ForkJoinPool.common.parallelism=1)
the initial plot time is not affected, but subsequent frames take
about 750ms. I don't need more than 1Gb of heap to do this.
The size of the plot on the screen can make a bit of difference.
You can get topcat to report these times by running it with
-verbose, or looking at the INFO-level reports in the log window
(http://www.starlink.ac.uk/topcat/sun253/LogWindow.html)
Output may look something like:
INFO: Caching plot data: 1 table, 1 mask, 2 coords
INFO: Data time: 1822
INFO: BasicParallel - tasks: 128, time: 93
INFO: Zone: 0 - Layers: 1, Paper: PixelOverlay
INFO: Plan time: 93
INFO: Paint time: 5
INFO: BasicParallel - tasks: 128, time: 29
INFO: Count time: 30
where "Data time" is the once-only read/calculation of the
coordinate values to plot, and "Plan/Paint time" is the time
spent rendering each frame.
So if topcat's taking long enough for you to get bored
(tens of seconds?) to do a scatter plot of 22 million rows,
there is probably some way to improve matters.
The most likely thing is that one way or another it's having to
do actual disk I/O to assemble the plot each time.
In most cases that's not necessary. If your input file is
an uncompressed binary format like FITS, the OS maps the parts
of file it needs into memory when it first reads them,
and it's cached in system buffers so no further I/O is required,
all reads are from memory and it's all fast.
If your input file is non-binary like CSV or VOTable
(or gzipped FITS) topcat reads it into a temporary binary file
at load time (slow to load, so not recommended), but subsequently
behaves the same as for FITS.
When this doesn't work well is if the cached parts of the file
are too large to sit in the system's available RAM.
Then it caches each row as it goes along, but as it reads
and caches the later parts of the file, the earlier parts are
bumped out of the cache and next time it needs them it has to
go back to the disk again, which is slow.
That can happen if your java heap (-Xmx...) is close to the
size of RAM (definitely a bad idea) or if the parts of
the input file you need are close to the size of RAM.
Three 64-bit columns of a 22Mrow file is only 0.5Gb which shouldn't
be a problem, but if they are embedded in a table with a lot of
columns, the whole file might be big enough to cause trouble
in this way.
If that's the problem, the solution is to reorganise your input
file so that the three columns you need are all in the same
part of the file. To do that, convert it (before you load it
into topcat) to colfits format:
http://www.starlink.ac.uk/stilts/sun256/inColfits.html
e.g.
stilts tpipe in=xxx.fits out=xxx.colfits
(or just load it into topcat and save it as colfits).
Because of the way temporary disk space is used by this command you
might need to configure the scratch space usage by doing something
like
stilts -Djava.io.tmpdir=. tpipe ...
Let me know how you get on (it's possible that the problem is elsewhere).
I'm happy to go into more detail, since I'm very keen to have
topcat working efficiently for large and very large datasets,
which it's usually capable of; the same applies to other readers
who feel like better performance might be possible. But if the
discussion gets too detailed maybe get back to me off-list to
avoid too much noise here.
If there's still a problem useful information would be: how long
plots actually take, what file format is in use, file size in
bytes/rows/columns, heap size, RAM size, whether you're using a
spinning or solid-state disk.
Congratulations to anybody who read this far!
Mark
--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
[log in to unmask] https://www.star.bristol.ac.uk/mbt/
########################################################################
To unsubscribe from the TOPCAT-USER list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=TOPCAT-USER&A=1
This message was issued to members of www.jiscmail.ac.uk/TOPCAT-USER, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/
|