Hello,
Just got back from Europython where I gave a talk about the underlying
computer software of the APIs. There was one question about how slow it
was to load 270000 lines of API code. I mentioned that the API was clever
enough only to load what it needs when it needs it (for the code, but
also true for data) but of course with Analysis you are pretty much
guaranteed to load up the Nmr package and by the time you include that
and the other packages which are also loaded up as a result (because Nmr
depends on them) then it's a large fraction of the 270k. And I had never
noticed much delay so I was sure it must have been of order a second.
The first time you run Analysis (or the FormatConverter or whatever) the
computer needs to compile the Python code. If you just run "python" then
it creates *.pyc files, if you run "python -O" then it creates *.pyo
files. (The analysis shell script in the bin directory of the release
code uses -O.) So the first time around the loading time is slow, but the
second and subsequent times it should be fast.
I've just done some timings and on my three-year old Linux box (1 GHz, but
the disk matters more and I haven't a clue about that) it took around 15
seconds to load up the Nmr package with "python" and around 19 seconds
with "python -O" the first time around (so no already existing *.pyc or
*.pyo files). The second time around (so already existing *.pyc and *.pyo
files) it took around 1.3 seconds in both cases.
Now I wondered whether over a network the performance might get hit, since
it's possible to install Analysis (or whatever) on one disk hanging off
one computer and run it from another computer. Now the *.py files and
*.pyc files are around 7 Mb in total and the *.pyo files around 6 Mb (I'm
just talking about the API code but that is the largest chunk). If you
have a really crappy network this might start to cause a problem.
But something more to worry about for most people than the network is file
permissions and related issues. If Analysis is installed in /usr/local
(say) by root and no *.pyc or *.pyo files are created by root then anyone
else using the code will not have write permission to create the *.pyc and
*.pyo files in those directories so what Python will have to do is compile
the code in memory each and every time, which as we have seen is slow.
Another problem which I just encountered is that even when the *.pyo and
*.pyc files exist sometimes Python cannot use them for reasons I don't
really understand (something to do with a "bad mtime" in my case), so
again it was having to compile all the code each and every time.
One way you can check what code is being used in Analysis is at the
provided Python shell prompt to type:
>>> sys.modules
(sys is already imported so you do not have to do it). Your list might
vary in order (it's a dictionary) but for example with mine the last entry
listed was for 'ccp.api.DbRef' with value
<module 'ccp.api.DbRef' from
'/edl2/wb104/ccpnmr/ccpnmr1.0/python/ccp/api/DbRef.py'>
and since that is a *.py rather than *.pyc or *.pyo file I know that means
trouble. (If you run python with -v then you get a whole slew of error
messages if you have problems loading up.)
Of course a related issue is where data is stored. Needless to say it's
always going to be faster if your data is on a local disk or reachable
via a bloody fast network, but people are already used to that problem.
This code problem is a new issue which arises just for Python.
Wayne
|