[WP3] AWBASE and AssociateList

Fri Mar 19 12:18:14 CET 2004

Hi Folks,

I just added the method get_attributes_of_singles to the AssociateList class. I 
also updated the Manual. I now start working on Marks problem reported yesterday.

Cheers,

Kor

Kor G. Begeman wrote:
> Fedor,
> 
> the slow response is mainly due to the python interface to oracle. 
> Fetching each
> row separately takes a lot more time than fetching the complete column, 
> so I've
> implemented just before my holidays two weeks ago the function 
> get_attrributes_of_pairs. The following example shows the difference, 
> starting with your example code:
> 
> ++++++++++++++++++++++++++++++++++++++++
> from Experimental.AssociateList import *
> from time import *
> 
> # First Fedor's test
> AL = (AssociateList.ALID == 40)[0]
> t=time()
> ( ( SL1, SID1 ), ( SL2, SID2 ) ) = AL.get_pairs()
> print 'get_pairs took ', time()-t, ' seconds'
> t1=time()
> for i in range(5):
>      t=time()
>      print i, SID1[i], SL1.sources[SID1[i]]['RA'], time()-t
> print "----------------"
> for i in range(10):
>      t=time()
>      print i, SID1[i], SL1.sources[SID1[i]]['RA'], time()-t
> print 'Total time needed: ', time()-t1, ' seconds'
> 
> d1 = { 'RA': [], 'DEC': [], 'SID': [] }
> 
> t=time()
> 
> aids = AL.get_attributes_of_pairs( d1 )
> print 'get_attributes_of_pairs took ', time()-t, ' seconds'
> 
> t1=time()
> for i in range(5):
>      t=time()
>      print i, SID1[i], d1['RA'][i], time()-t
> print "----------------"
> for i in range(10):
>      t=time()
>      print i, SID1[i], d1['RA'][i], time()-t
> 
> print 'Total time needed: ', time()-t1, ' seconds'
> ----------------------------------------
> 
> The results of the above code is the following:
> 
> ++++++++++++++++++++++++++++++++++++++++
> get_pairs took  21.3978278637  seconds
> 0 4 85.030808326 18.9332048893
> 1 4 85.030808326 0.000309944152832
> 2 8 84.97373298 5.94629907608
> 3 12 85.0029018978 5.70203304291
> 4 13 84.5810009913 5.91029787064
> ----------------
> 0 4 85.030808326 0.000305891036987
> 1 4 85.030808326 0.00025200843811
> 2 8 84.97373298 0.000249147415161
> 3 12 85.0029018978 0.000250101089478
> 4 13 84.5810009913 0.000272989273071
> 5 14 84.5800842374 5.9364991188
> 6 16 85.0032172429 5.71908092499
> 7 18 84.9222978506 5.75219202042
> 8 23 84.9953589695 6.23765897751
> 9 26 84.5978693438 6.03286409378
> Total time needed:  66.1724839211  seconds
> get_attributes_of_pairs took  32.2162249088  seconds
> 0 4 85.030808326 9.17911529541e-05
> 1 4 85.030808326 6.07967376709e-05
> 2 8 84.97373298 5.48362731934e-05
> 3 12 85.0029018978 5.29289245605e-05
> 4 13 84.5810009913 5.3882598877e-05
> ----------------
> 0 4 85.030808326 5.69820404053e-05
> 1 4 85.030808326 5.31673431396e-05
> 2 8 84.97373298 5.41210174561e-05
> 3 12 85.0029018978 5.29289245605e-05
> 4 13 84.5810009913 5.31673431396e-05
> 5 14 84.5800842374 5.19752502441e-05
> 6 16 85.0032172429 5.29289245605e-05
> 7 18 84.9222978506 5.41210174561e-05
> 8 23 84.9953589695 5.31673431396e-05
> 9 26 84.5978693438 5.19752502441e-05
> Total time needed:  0.00702881813049  seconds
> ----------------------------------------
> 
> Your suggestion of retrieving all sourcelist data into memory is not 
> advisable since it might eat up al available memory on your machine. 
> Just retrieving the data that you really want in one oracle query seems 
> to be a much better solution.
> 
> At the moment I am working on implementing something similar for singles,
> get_attributes_of_singles.
> 
> 
> I hope this helps in speeding up things,
> 
> 
> Kor.
> 
> 
> Fedor I. Getman wrote:
> 
>> Good day Danny,
>>
>> We trying to match some catalogs and found bad thing:
>> then we access to list of resulting pairs, produced by
>> AssociateList.get_pairs (also the same for singles).
>> We have very slow response (approx 2-4 sec to initiate each
>> first time accessed list item).
>>
>> we use something like:
>>
>> from Experimental.AssociateList import *
>> from time import *
>> AL = (AssociateList.ALID == 4)[0]
>> ( ( SL1, SID1 ), ( SL2, SID2 ) ) = AL.get_pairs()
>> for i in range(5):
>>      t=time()
>>      print i, SID1[i], SL1.sources[SID1[i]]['RA'], time()-t
>> print "----------------"
>> for i in range(10):
>>      t=time()
>>      print i, SID1[i], SL1.sources[SID1[i]]['RA'], time()-t
>>
>>
>>
>> and tipical result:
>>
>> 0 245 186.202586063 17.0450201035
>> 1 248 186.025597431 2.86219286919
>> 2 249 186.172369307 3.74087810516
>> 3 250 186.062742686 3.92979979515
>> 4 255 186.188157002 3.66110897064
>> ----------------
>> 0 245 186.202586063 0.000329971313477
>> 1 248 186.025597431 0.000154972076416
>> 2 249 186.172369307 0.000148057937622
>> 3 250 186.062742686 0.000146150588989
>> 4 255 186.188157002 0.000161170959473
>> 5 256 186.300818927 4.09186816216
>> 6 257 185.890934955 3.21303105354
>> 7 259 186.032008037 3.62871098518
>> 8 262 186.183697329 3.29224586487
>> 9 266 185.838021412 3.08625602722
>>
>> Seems, that awe do sql query to initialise first time accessed object.
>> Can You or Kor change this and fill list by data from DB on creation list
>> object? Or add metod "fill" or "retrieve".
>>
>> I hope that 1 sql request and parsing its result will be much faster then
>> do thousand sql requests for each list item.
>>
>>
>> On Wed, 17 Mar 2004, Danny R. Boxhoorn wrote:
>>
>>
>>> Hello again Fedor,
>>>
>>> This morning Ewout and I discussed the filenaming convention and 
>>> consequently
>>> had a look at the use of swarp in RegriddedFrame. We came to the 
>>> conclusion
>>> that a "clean" solution using symbolic links for the weight frames 
>>> was feasible.The solution is "clean" in the sense that the 
>>> SwarpConfig in the database
>>> contains the configuration that was used to run swarp. Ewout has 
>>> implemented
>>> this, so it's now in cvs as AWBASE. The implementation also uses 
>>> links to the
>>> RegriddedFrames (or ScienceFrames), which means you can coadd about 
>>> 450 images
>>> (I think) before you hit the next limit.
>>>
>>> Ciao,
>>>
>>>                                                   Danny
>>>
>>>
>>> On Tue, Mar 16, 2004 at 11:35:08PM +0100, FО©╫dor Getman wrote:
>>>
>>>> Good evening Danny!
>>>>
>>>> On Tue, 16 Mar 2004, Danny R. Boxhoorn wrote:
>>>>
>>>>
>>>>> Please contact Emmanuel directly that you'd like to see the limits 
>>>>> increased
>>>>> (and at least give a decent warning when a limit is exceeded [)
>>>>
>>>>
>>>> Ok.
>>>>
>>>>
>>>>>> Enlarging buffer for strings not complitely solve  our problem.
>>>>>> You can see: correct way is using not 256 but PATH_MAX
>>>>>> (in linux=4095) as possible filename length. Also in swarp max 
>>>>>> amount files
>>>>>> defined by MAXINFIELD is 8192. So we recieve 4095*8192=32MB stored 
>>>>>> in stack.
>>>>>> But default value allocated for stack is 8 MB and "Segmentation 
>>>>>> fault" :(
>>>>>>
>>>>>> (Exist another restricton: ARG_MAX for exec() allow only 128 kB 
>>>>>> data pass
>>>>>> to external command)
>>>>>
>>>>>
>>>>> True and that's a problem I'd like Emmanuel to solve. We simply 
>>>>> didn't want
>>>>> to wait for him to do that and went for the quickest solution that 
>>>>> worked.
>>>>> You're free to not use swarp until it gets fixed ... `)
>>>>
>>>>
>>>> :)
>>>>
>>>> Yes, as temporary workaround it's work. But for future i prefer have 
>>>> more
>>>> robust solution. May be Bertin implement multiline definiton for 
>>>> keyword.
>>>>
>>>>
>>>>
>>>>>> In swarp exist workaround: weigthmap files must have the same 
>>>>>> basename as
>>>>>> scienses with different extension (defined in swarp.conf, default 
>>>>>> .weight.fits).
>>>>>> So after retrieving weithmap fits, we can link (or rename) them in
>>>>>> something like
>>>>>> Sci-TIG-WFI-----#854-ccd57-Regr--Sci-53079.4259863.weight.fits
>>>>>> instead
>>>>>> Cal-TIG-WFI-----#854-ccd57-Regr--Wei-53079.4259863.fits
>>>>>>
>>>>>> Or we can change filename agreement and put "weight" keyword as 
>>>>>> presuffix.
>>>>>
>>>>>
>>>>> The link or rename will not work because the SwarpConfig is also 
>>>>> stored
>>>>> in the database and it would refer to the "wrong" weight image.
>>>>
>>>>
>>>> In case "suffix" definition of weigthmap files, in config we put only
>>>> suffix, not filename list.
>>>>
>>>>
>>>>> As far as I'm concerned it's fine to change the filename convention 
>>>>> and
>>>>> I'll talk to Erik about it tomorrow.
>>>>
>>>>
>>>> Good luck!
>>>>
>>>>
>>>>>> Also analogical problem possible can rise for FSCALE_DEFAULT list. 
>>>>>> Why not
>>>>>> put this parameter in header of science fits?
>>>>>
>>>>>
>>>>> Indeed, now I do not remember what the reason was to avoid doing that.
>>>>> Could you please implement it and commit it if it works?
>>>>
>>>>
>>>> I will do and test this tomorrow.
>>>>
>>>> Ciao,
>>>>     Fedor
>>>>
>>>> ----------------------------------------
>>>>            Fedor I. Getman
>>>> ----------------------------------------
>>>> INAF (Istituto Nazionale di AstroFisica)
>>>> Osservatorio Astronomico di Capodimonte
>>>> via Moiariello 16, I-80131 Napoli, Italy
>>>> ----------------------------------------
>>>> tel/fax:  +39-081-5575445/456710
>>>> e-mail:   tig at na.astro.it
>>>> ----------------------------------------
>>>
>>>
>>
>> ----------------------------------------
>>             Fedor I. Getman
>> ----------------------------------------
>> INAF (Istituto Nazionale di AstroFisica)
>> Osservatorio Astronomico di Capodimonte
>> via Moiariello 16, I-80131 Napoli, Italy
>> ----------------------------------------
>> tel/fax:  +39-081-5575445/456710
>> e-mail:   tig at na.astro.it
>> ----------------------------------------
> 
> 

-- 

Dr. K.G. Begeman
OmegaCEN
Kapteyn Institute
University of Groningen
Postbus 800                         NL-9700 AV Groningen
Landleven 12                        NL-9747 AD Groningen
The Netherlands
Telephone                           +31-(0)50-3634059/4073
Telefax                             +31-(0)50-3636100
e-Mail                              kgb at astro.rug.nl
WWW                                 http://www.astro.rug.nl/~kgb