JISCMail - TOPCAT-USER Archives

Email discussion lists for the UK Education and Research communities

Subscriber's Corner

Email Lists

TOPCAT-USER Archives

TOPCAT-USER@JISCMAIL.AC.UK

View:

Message:

[

First

Last

]

By Topic:

[

First

Last

]

By Author:

[

First

Last

]

Font:

Proportional Font

		LISTSERV Archives
		TOPCAT-USER Home
		TOPCAT-USER September 2023

Options

Subscribe or Unsubscribe

Get Password

Subject:

Re: Speed up the visualization of catalog

From:

Mark Taylor <[log in to unmask]>

Reply-To:

TOPCAT and STILTS support <[log in to unmask]>

Date:

Tue, 19 Sep 2023 09:58:41 +0100

Content-Type:

text/plain

Parts/Attachments:

text/plain (118 lines)

Hi Ming.

On Tue, 19 Sep 2023, Ming Yang wrote:

> May I ask that, is there any ways to speed up the visualization of a catalog? For example, I have to wait for quite some time to see a color-magnitude diagram of a catalog with 22m targets. Many thanks!~

... probably.

TL;DR: converting your input data to colfits is most likely going to help
(http://www.starlink.ac.uk/stilts/sun256/inColfits.html).
But for more detail, read on.

For reference, using a 22 Mrow 7-column FITS file stored on a
spinning disk on my 20-core machine, an initial 3-column
default scatter plot takes around 3 seconds, and subsequent frames
(pan/zoom with Sketch Frames option turned off) about 90ms.
If I restrict the plot processing to single-core operation
(topcat -Djava.util.concurrent.ForkJoinPool.common.parallelism=1)
the initial plot time is not affected, but subsequent frames take
about 750ms. I don't need more than 1Gb of heap to do this.
The size of the plot on the screen can make a bit of difference.

You can get topcat to report these times by running it with
-verbose, or looking at the INFO-level reports in the log window
(http://www.starlink.ac.uk/topcat/sun253/LogWindow.html)
Output may look something like:

INFO: Caching plot data: 1 table, 1 mask, 2 coords
INFO: Data time: 1822
INFO: BasicParallel - tasks: 128, time: 93
INFO: Zone: 0 - Layers: 1, Paper: PixelOverlay
INFO: Plan time: 93
INFO: Paint time: 5
INFO: BasicParallel - tasks: 128, time: 29
INFO: Count time: 30

where "Data time" is the once-only read/calculation of the
coordinate values to plot, and "Plan/Paint time" is the time
spent rendering each frame.

So if topcat's taking long enough for you to get bored
(tens of seconds?) to do a scatter plot of 22 million rows,
there is probably some way to improve matters.

The most likely thing is that one way or another it's having to
do actual disk I/O to assemble the plot each time.
In most cases that's not necessary. If your input file is
an uncompressed binary format like FITS, the OS maps the parts
of file it needs into memory when it first reads them,
and it's cached in system buffers so no further I/O is required,
all reads are from memory and it's all fast.
If your input file is non-binary like CSV or VOTable
(or gzipped FITS) topcat reads it into a temporary binary file
at load time (slow to load, so not recommended), but subsequently
behaves the same as for FITS.

When this doesn't work well is if the cached parts of the file
are too large to sit in the system's available RAM.
Then it caches each row as it goes along, but as it reads
and caches the later parts of the file, the earlier parts are
bumped out of the cache and next time it needs them it has to
go back to the disk again, which is slow.

That can happen if your java heap (-Xmx...) is close to the
size of RAM (definitely a bad idea) or if the parts of
the input file you need are close to the size of RAM.
Three 64-bit columns of a 22Mrow file is only 0.5Gb which shouldn't
be a problem, but if they are embedded in a table with a lot of
columns, the whole file might be big enough to cause trouble
in this way.

If that's the problem, the solution is to reorganise your input
file so that the three columns you need are all in the same
part of the file. To do that, convert it (before you load it
into topcat) to colfits format:

http://www.starlink.ac.uk/stilts/sun256/inColfits.html

e.g.

stilts tpipe in=xxx.fits out=xxx.colfits

(or just load it into topcat and save it as colfits).

Because of the way temporary disk space is used by this command you
might need to configure the scratch space usage by doing something
like

stilts -Djava.io.tmpdir=. tpipe ...

Let me know how you get on (it's possible that the problem is elsewhere).
I'm happy to go into more detail, since I'm very keen to have
topcat working efficiently for large and very large datasets,
which it's usually capable of; the same applies to other readers
who feel like better performance might be possible. But if the
discussion gets too detailed maybe get back to me off-list to
avoid too much noise here.

If there's still a problem useful information would be: how long
plots actually take, what file format is in use, file size in
bytes/rows/columns, heap size, RAM size, whether you're using a
spinning or solid-state disk.

Congratulations to anybody who read this far!

Mark

--
Mark Taylor Astronomical Programmer Physics, Bristol University, UK
[log in to unmask] https://www.star.bristol.ac.uk/mbt/

########################################################################

To unsubscribe from the TOPCAT-USER list, click the following link:
https://www.jiscmail.ac.uk/cgi-bin/WA-JISC.exe?SUBED1=TOPCAT-USER&A=1

This message was issued to members of www.jiscmail.ac.uk/TOPCAT-USER, a mailing list hosted by www.jiscmail.ac.uk, terms & conditions are available at https://www.jiscmail.ac.uk/policyandsecurity/

Top of Message | Previous Page | Permalink

JiscMail Tools

Files Area | help

RSS Feeds and Sharing

Search Archives

Advanced Options

Archives

May 2024
April 2024
March 2024
January 2024
December 2023
November 2023
October 2023
September 2023
August 2023
June 2023
May 2023
April 2023
March 2023
February 2023
January 2023
December 2022
November 2022
October 2022
September 2022
July 2022
June 2022
May 2022
April 2022
March 2022
February 2022
January 2022
December 2021
November 2021
October 2021
September 2021
August 2021
July 2021
June 2021
May 2021
April 2021
March 2021
February 2021
January 2021
December 2020
November 2020
October 2020
September 2020
August 2020
July 2020
June 2020
May 2020
April 2020
March 2020
February 2020
January 2020
December 2019
November 2019
October 2019
September 2019
August 2019
July 2019
June 2019
May 2019
April 2019
March 2019
February 2019
January 2019
December 2018
November 2018
October 2018
September 2018
August 2018
July 2018
June 2018
May 2018
April 2018
March 2018
February 2018
January 2018
December 2017
November 2017
October 2017
September 2017
August 2017
July 2017
June 2017
May 2017
April 2017
March 2017
February 2017
January 2017
December 2016
November 2016
October 2016
September 2016
July 2016
June 2016
May 2016
April 2016
March 2016
February 2016
January 2016
December 2015
November 2015
October 2015
September 2015
August 2015
July 2015
June 2015
May 2015
April 2015

JiscMail is a Jisc service.

View our service policies at https://www.jiscmail.ac.uk/policyandsecurity/ and Jisc's privacy policy at https://www.jisc.ac.uk/website/privacy-notice

For help and support help@jisc.ac.uk