API¶
Base class for parent sample. |
|
Base class for WISE Data |
|
WISEData class to bin lightcurve by visits. |
|
A class to download WISE data with multiple threads and do the binning on the DESY cluster. |
|
Parent Sample¶
The Base Class¶
- class timewise.parent_sample_base.ParentSampleBase(base_name)[source]¶
Base class for parent sample. Any subclass must implement
ParentSample.df: A pandas.DataFrame consisting of minimum three columns: two columns holding the sky positions of each object in the form of right ascension and declination and one row with a unique identifier.
ParentSample.default_keymap: a dictionary, mapping the column in ParentSample.df to ‘ra’, ‘dec’ and ‘id’
- Parameters:
base_name – determining the location of any data in the timewise data directory.
- plot_cutout(ind, arcsec=20, interactive=False, **kwargs)[source]¶
Plot the coutout images in all filters around the position of object with index i
- Parameters:
ind (int or list-like) – the index in the sample
arcsec (float) – the radius of the cutout
interactive (bool) – interactive mode
kwargs – any additional kwargs will be passed to matplotlib.pyplot.subplots()
- Returns:
figure and axes if interactive=True
WISEData¶
The Base Class¶
- class timewise.wise_data_base.WISEDataBase(base_name, parent_sample_class, min_sep_arcsec, n_chunks)[source]¶
Base class for WISE Data
- Parameters:
parent_sample_class (ParentSample class) – class for parent sample
base_name (str) – unique name to determine storage directories
min_sep (astropy.units.Quantity) – query region around source for positional query
whitelist_region (astropy.units.Quantity) – region around source where all datapoints are accepted in positional query
n_chunks (int) – number of chunks in declination
parent_wise_source_id_key (str) – key for the WISE source ID in the parent sample
parent_sample_wise_skysep_key (str) – key for the angular separation to the WISE source in the parent sample
parent_sample_default_entries (dict) – default entries for the parent sample
cache_dir (str) – directory for cached data
cluster_dir (str) – directory for cluster data
cluster_log_dir – directory for cluster logs
output_dir (str) – directory for output data
lightcurve_dir (str) – directory for lightcurve data
plots_dir (str) – directory for plots
submit_file (str) – file for cluster submission
tap_jobs (list[pyvo.dal.tap.TAPJob]) – TAP jobs
queue (multiprocessing.Queue) – queue for cluster jobs
clear_unbinned_photometry_when_binning (bool) – whether to clear unbinned photometry when binning
chunk_map (np.ndarray) – map of chunks
service_url (str) – URL of the TAP service
service (timewise.utils.StableTAPService) – custom TAP service, making sure that the TAP jobs are stable
active_tap_phases (set) – phases of TAP jobs that are still active
running_tap_phases (list) – phases of TAP jobs that are still running
done_tap_phases (set) – phases of TAP jobs that are done
query_types (list) – query types
table_names (pd.DataFrame) – map nice and program table names of WISE data tables
bands (list) – WISE bands
flux_key_ext (str) – key extension for flux keys
flux_density_key_ext (str) – key extension for flux density keys
mag_key_ext (str) – key extension for magnitude keys
luminosity_key_ext (str) – key extension for luminosity keys
error_key_ext (str) – key extension for error keys
band_plot_colors (dict) – plot colors for bands
photometry_table_keymap (dict) – keymap for photometry tables, listing the column names for flux, mag etc for the different WISE data tables
magnitude_zeropoints (dict) – magnitude zeropoints from here
constraints (list) – constraints for TAP queries selecting good datapoints as explained in the explanatory supplements
parent_wise_source_id_key – key for the WISE source ID in the parent sample
parent_sample_wise_skysep_key – key for the angular separation to the WISE source in the parent sample
- add_flux_densities_to_saved_lightcurves(service)[source]¶
Adds flux densities to all downloaded lightcurves
- Parameters:
service (str) – The service with which the lightcurves were downloaded
- add_flux_density(lightcurve, mag_key, emag_key, mag_ul_key, f_key, ef_key, f_ul_key, do_color_correction=False)[source]¶
Adds flux densities to a lightcurves
- Parameters:
lightcurve (pandas.DataFrame) –
mag_key (str) – the key in lightcurve that holds the magnitude
emag_key (str) – the key in lightcurve that holds the error of the magnitude
mag_ul_key (str) – the key in lightcurve that holds the upper limit for the magnitude
f_key (str) – the key that will hold the flux density
ef_key (str) – the key that will hold the flux density error
f_ul_key (str) – the key that will hold the flux density upper limit
do_color_correction (bool) –
- Returns:
the lightcurve with flux density
- Return type:
pandas.DataFrame
- add_luminosity_to_saved_lightcurves(service, redshift_key=None, distance_key=None)[source]¶
Add luminosities to all lightcurves, calculated from flux densities and distance or redshift
- Parameters:
service (str) – the service with which the lightcurves were downloaded
redshift_key (str) – the key in the parent sample data frame that holds the redshift info
distance_key (str) – the key in the parent sample data frame that holds the distance info
- abstract bin_lightcurve(lightcurve)[source]¶
Bins a lightcurve
- Parameters:
lightcurve (pandas.DataFrame) – The unbinned lightcurve
- Returns:
the binned lightcurve
- Return type:
pd.DataFrame
- calculate_metadata(service, chunk_number=None, jobID=None, overwrite=True)[source]¶
- Calculates the metadata for all downloaded lightcurves.
Results will be saved under
</path/to/timewise/data/dir>/output/<base_name>/lightcurves/metadata_<service>.json
- Parameters:
service (str) – the service with which the lightcurves were downloaded
chunk_number (int) – the chunk number to use, default uses all chunks
jobID (int) – the job ID to use, default uses all lightcurves
overwrite (bool) – overwrite existing metadata file
- abstract calculate_metadata_single(lcs)[source]¶
Calculates some properties of the lightcurves
- Parameters:
lcs (pandas.DataFrame) – the lightcurve
- static calculate_position_mask(lightcurve, ra, dec, whitelist_region, return_all=False)[source]¶
Estimated the 90th percentile of the angular separations from the given position. Assuming a 2D-Gaussian, calculate the standard deviation for the 90th percentile. Keeps all datapoints within five times the standard deviation.
- Parameters:
lightcurve (pd.DataFrame) – unstacked lightcurve
ra (Sequence[float]) – RA in degrees of the source
dec (Sequence[float]) – Dec in degrees of the source
return_all (bool, optional) – if True, return all info collected in the selection process
whitelist_region (float) – region in which to keep all datapoints [arcsec]
- Returns:
positional mask (and result of the clustering algorithm and the mask for the closest allwise data if return_all is True)
- Return type:
list (return_all is False) or tuple (list, sklearn.cluster.HDBSCAN, list) (return_all is True)
- find_color_correction(w1_minus_w2)[source]¶
Find the color correction based on the W1-W2 color. See this
- Parameters:
w1_minus_w2 (float) –
- Returns:
the color correction factor
- Return type:
float
- static get_db_name(table_name, nice=False)[source]¶
Get the right table name
- Parameters:
table_name – str, table name
nice – bool, whether to get the nice table name
- Returns:
str
- get_photometric_data(tables=None, perc=1, wait=0, service=None, nthreads=100, chunks=None, overwrite=True, remove_chunks=False, query_type='positional', skip_download=False, mask_by_position=False)[source]¶
Load photometric data from the IRSA server for the matched sample. The result will be saved under
</path/to/timewise/data/dir>/output/<base_name>/lightcurves/binned_lightcurves_<service>.json
- Parameters:
remove_chunks (bools) – remove single chunk files after binning
overwrite (bool) – overwrite already existing lightcurves and metadata
tables (str or list-like) – WISE tables to use for photometry query, defaults to AllWISE and NOEWISER photometry
perc (float) – percentage of sources to load photometry for, default 1
nthreads (int) – max number of threads to launch
service (str) – either of ‘gator’ or ‘tap’, selects base on elements per chunk by default
wait (float) – time in hours to wait after submitting TAP jobs
chunks (list-like) – containing indices of chunks to download
query_type (str) – ‘positional’: query photometry based on distance from object, ‘by_allwise_id’: select all photometry points within a radius of 50 arcsec with the corresponding AllWISE ID
skip_download (bool) – if True skip downloading and only do binning
mask_by_position (bool) – if True mask single exposures that are too far away from the bulk
- get_position_mask(service, chunk_number)[source]¶
Get the position mask for a chunk
- Parameters:
service (str) – The service that was used to download the data, either of gator or tap
chunk_number (int) – chunk number
- Returns:
position masks
- Return type:
dict
- get_unbinned_lightcurves(chunk_number, clear=False)[source]¶
Get the unbinned lightcurves for a given chunk number.
- Parameters:
chunk_number (int) – int
clear (bool, optional) – remove files after loading, defaults to False
- load_data_product(service, chunk_number=None, jobID=None, return_filename=False, verify_contains_lightcurves=False)[source]¶
Load data product from disk
- Parameters:
service (str) – service used to download data (‘tap’ or ‘gator’)
chunk_number (int, optional) – chunk number to load, if None load combined file for this service
jobID (int, optional) – jobID to load, if None load the combined file for this chunk
return_filename (bool, optional) – return filename of data product, defaults to False
verify_contains_lightcurves (bool, optional) – verify that the data product contains lightcurves, defaults to False
- luminosity_from_flux_density(flux_density, band, distance=None, redshift=None, unit='erg s-1', flux_density_unit='mJy')[source]¶
Converts a flux density into a luminosity
- Parameters:
flux_density (float or numpy.ndarray) –
band (str) –
distance (astropy.Quantity) – distance to source, if not given will use luminosity distance from redshift
redshift (float) – redshift to use when calculating luminosity distance
unit (str or astropy.unit) – unit in which to give the luminosity, default is erg s-1 sm-2
flux_density_unit (str or astropy.unit) – unit in which the flux density is given, default is mJy
- Returns:
the resulting luminosities
- Return type:
float or ndarray
- match_all_chunks(table_name='AllWISE Source Catalog', save_when_done=True, additional_columns=None)[source]¶
Match the parent sample to a WISE catalogue and add the result to the parent sample.
- Parameters:
table_name (str) – The name of the table you want to match against
save_when_done (bool) – save the parent sample dataframe with the matching info when done
additional_columns (list) – optional, additional columns to add to the matching table
- Returns:
- plot_lc(parent_sample_idx, service='tap', plot_unbinned=False, plot_binned=True, interactive=False, fn=None, ax=None, save=True, lum_key='flux_density', **kwargs)[source]¶
Make a pretty plot of a lightcurve
- Parameters:
parent_sample_idx (int) – The index in the parent sample of the lightcurve
service (str) – the service with which the lightcurves were downloaded
plot_unbinned (bool) – plot unbinned data
plot_binned (bool) – plot binned lightcurve
interactive (bool) – interactive mode
fn (str) – filename, defaults to </path/to/timewise/data/dir>/output/plots/<base_name>/<parent_sample_index>_<lum_key>.pdf
ax – pre-existing matplotlib.Axis
save (bool) – save the plot
lum_key – the unit of luminosity to use in the plot, either of ‘mag’, ‘flux_density’ or ‘luminosity’
kwargs – any additional kwargs will be passed on to matplotlib.pyplot.subplots()
- Returns:
the matplotlib.Figure and matplotlib.Axes if interactive=True
- vegamag_to_flux_density(vegamag, band, unit='mJy', color_correction=None)[source]¶
This converts the detector level brightness m in Mag_vega to a flux density F
F = (F_nu / f_c) * 10 ^ (-m / 2.5)
where F_nu is the zeropoint flux for the corresponding band and f_c a color correction factor. See this
- Parameters:
vegamag (float or numpy.ndarray) –
band (str) –
unit (str) – unit to convert the flux density to
color_correction (float or numpy.ndarray or dict) – the colorcorection factor, if dict the keys have to be ‘f_c(“band”)’
- Returns:
the flux densities
- Return type:
ndarray
Bin lightcurves by visit¶
- class timewise.wise_data_by_visit.WiseDataByVisit(base_name, parent_sample_class, min_sep_arcsec, n_chunks, clean_outliers_when_binning=True, multiply_flux_error=True)[source]¶
WISEData class to bin lightcurve by visits. The visits typically consist of some tens of observations. The individual visits are separated by about six months. The mean flux for one visit is calculated by the weighted mean of the data. The error on that mean is calculated by the root-mean-squared and corrected by the t-value. Outliers per visit are identified if they are more than 20 times the rms away from the mean. In addition to the attributes of
timewise.WISEDataBase
this class has the following attributes:- Parameters:
clean_outliers_when_binning (bool) – whether to remove outliers by brightness when binning
mean_key (str) – the key for the mean
median_key (str) – the key for the median
rms_key (str) – the key for the rms
upper_limit_key (str) – the key for the upper limit
Npoints_key (str) – the key for the number of points
zeropoint_key_ext (str) – the key for the zeropoint
- bin_lightcurve(lightcurve)[source]¶
Combine the data by visits of the satellite of one region in the sky. The visits typically consist of some tens of observations. The individual visits are separated by about six months. The mean flux for one visit is calculated by the weighted mean of the data. The error on that mean is calculated by the root-mean-squared and corrected by the t-value. Outliers per visit are identified if they are more than 100 times the rms away from the mean. These outliers are removed from the calculation of the mean and the error if self.clean_outliers_when_binning is True.
- Parameters:
lightcurve (pandas.DataFrame) – the unbinned lightcurve
- Returns:
the binned lightcurve
- Return type:
pandas.DataFrame
- calculate_epochs(f, e, visit_mask, counts, remove_outliers, outlier_mask=None)[source]¶
Calculates the binned epochs of a lightcurve.
- Parameters:
f (np.array) – the fluxes
e (np.array) – the flux errors
visit_mask (np.array) – the visit mask
counts (np.array) – the counts
remove_outliers (bool) – whether to remove outliers
outlier_mask (np.array) – the outlier mask
- Returns:
the epoch
- Return type:
float
- calculate_metadata_single(lc)[source]¶
Calculates some metadata, describing the variability of the lightcurves.
max_dif: maximum difference in magnitude between any two datapoints
min_rms: the minimum errorbar of all datapoints
N_datapoints: The number of datapoints
max_deltat: the maximum time difference between any two datapoints
mean_weighted_ppb: the weighted average brightness where the weights are the points per bin
- Parameters:
lc (dict) – the lightcurves
- Returns:
the metadata
- Return type:
dict
- static get_visit_map(lightcurve)[source]¶
Create a map datapoint to visit
- Parameters:
lightcurve (pd.DataFrame) – the unbinned lightcurve
- Returns:
visit map
- Return type:
np.ndarray
- plot_diagnostic_binning(service, ind, lum_key='mag', interactive=False, fn=None, save=True, which='panstarrs', arcsec=20)[source]¶
Show a skymap of the single detections and which bin they belong to next to the binned lightcurve
- Parameters:
service (str) – service used to download data, either of ‘tap’ or ‘gator’
ind (str, int) – index of the object in the parent sample
lum_key (str) – the key of the brightness unit, either of flux (instrument flux in counts) or mag
interactive (bool) – if function is used interactively, return mpl.Figure and mpl.axes if True
fn (str) – filename for saving
save (bool) – saves figure if True
which (str) – survey to get the cutout from, either of ‘sdss’ or ‘panstarrs’
arcsec (float) – size of cutout
- Returns:
Figure and axes if interactive=True
- Return type:
mpl.Figure, mpl.Axes
Use the DESY cluster in Zeuthen to do the binning¶
- class timewise.wise_bigdata_desy_cluster.WISEDataDESYCluster(base_name, parent_sample_class, min_sep_arcsec, n_chunks, clean_outliers_when_binning=True, multiply_flux_error=True)[source]¶
A class to download WISE data with multiple threads and do the binning on the DESY cluster. In addition to the attributes of WiseDataByVisit this class has the following attributes:
- Parameters:
executable_filename (str) – the filename of the executable that will be submitted to the cluster
submit_file_filename (str) – the filename of the submit file that will be submitted to the cluster
job_id (str) – the job id of the submitted job
cluster_jobID_map (dict) – a dictionary mapping the chunk number to the cluster job id
clusterJob_chunk_map (dict) – a dictionary mapping the cluster job id to the chunk number
cluster_info_file (str) – the filename of the file that stores the cluster info, loaded by the cluster jobs
start_time (float) – the time when the download started
- condor_status(job_id)[source]¶
Get the status of jobs running on condor. :return: number of jobs that are done, running, waiting, total, held
- static get_condor_status()[source]¶
Queries condor to get cluster status. :return: str, output of query command
- get_coverage(chunk, lum_key, load_from_bigdata_dir=False)[source]¶
Get the coverage of the MEASURED median for a given chunk and lum_key
- Parameters:
chunk (int, list[int]]) – chunk number
lum_key (str) – luminosity key
load_from_bigdata_dir (bool, optional) – if True, load the coverage from the bigdata directory
- get_red_chi2(chunk, lum_key, use_bigdata_dir=False)[source]¶
Get the reduced chi2 for a given chunk or multiple chunks
- Parameters:
chunk (int or list) – the chunk number or list of chunk numbers
lum_key (str) – the unit of luminosity to use in the plot, either of ‘mag’, ‘flux’ or ‘flux_density’
use_bigdata_dir (bool, optional) – load from the big data storage directory, default is False
- Returns:
the reduced chi2 for each band, the DataFrame will have columns chi2, med_lum and N_datapoints
- Return type:
dict[str, pd.DataFrame]
- get_sample_photometric_data(max_nTAPjobs=8, perc=1, tables=None, chunks=None, cluster_jobs_per_chunk=100, wait=5, remove_chunks=False, query_type='positional', overwrite=True, storage_directory=None, node_memory='8G', skip_download=False, skip_input=False, mask_by_position=False)[source]¶
An alternative to get_photometric_data() that uses the DESY cluster and is optimised for large datasets.
- Parameters:
max_nTAPjobs (int) – The maximum number of TAP jobs active at the same time.
perc (float) – The percentage of chunks to download
tables (str or list-like) – The tables to query
chunks (list-like) – chunks to download, default is all of the chunks
cluster_jobs_per_chunk (int) – number of cluster jobs per chunk
wait (float) – time in hours to wait after submitting TAP jobs
remove_chunks (bool) – remove single chunk files after binning
query_type (str) – ‘positional’: query photometry based on distance from object, ‘by_allwise_id’: select all photometry points within a radius of 50 arcsec with the corresponding AllWISE ID
overwrite (bool) – overwrite already existing lightcurves and metadata
storage_directory (str) – move binned files and raw data here after work is done
node_memory (str) – memory per node on the cluster, default is 8G
skip_download (bool) – if True, assume data is already downloaded, only do binning in that case
skip_input (bool) – if True do not ask if data is correct before download
mask_by_position (bool) – if True mask single exposures that are too far away from the bulk
- get_submit_file_filename(ids)[source]¶
Get the filename of the submit file for given job ids
- Parameters:
ids (list) – list of job ids
- Returns:
filename
- Return type:
str
- load_data_product(service, chunk_number=None, jobID=None, return_filename=False, use_bigdata_dir=False, verify_contains_lightcurves=False)[source]¶
Load data product from disk
- Parameters:
service (str) – service used to download data (‘tap’ or ‘gator’)
chunk_number (int, optional) – chunk number to load, if None load combined file for this service
jobID (int, optional) – jobID to load, if None load the combined file for this chunk
return_filename (bool, optional) – return filename of data product, defaults to False
verify_contains_lightcurves (bool, optional) – verify that the data product contains lightcurves, defaults to False
- make_chi2_plot(index_mask=None, chunks=None, load_from_bigdata_dir=False, lum_key='_flux_density', interactive=False, save=False, nbins=100, cumulative=True, upper_bound=4)[source]¶
Make a plot of the reduced chi2 distribution for a given chunk or multiple chunks
- Parameters:
index_mask (dict) – a mask to apply to the parent sample, eg {‘AGNs’: agn_mask}
chunks (int or list) – the chunk number or list of chunk numbers
load_from_bigdata_dir (bool, optional) – load from the big data storage directory, default is False
lum_key (str) – the unit of luminosity to use in the plot, either of ‘mag’, ‘flux’ or ‘flux_density’
interactive (bool) – return the figure and axes if True, default is False
save (bool) – save the plot, default is False
nbins (int) – the number of bins to use in the histogram, default is 100
cumulative (bool) – plot the cumulative distribution, default is True
upper_bound (float) – the upper bound of the x-axis, default is 4
- Returns:
the matplotlib.Figure and matplotlib.Axes if interactive=True
- Return type:
tuple[mpl.Figure, mpl.Axes]
- make_coverage_plots(index_mask=None, chunks=None, load_from_bigdata_dir=False, lum_key='_flux_density', interactive=False, save=False, nbins=100)[source]¶
Make the coverage plots for the measured median of the specified luminosity unit
- Parameters:
index_mask (dict, optional) – index mask to apply to the data, e.g. {“AGNs”: agn_mask}
chunks (list[int], int, optional) – chunks to use, if None use all chunks
load_from_bigdata_dir (bool, optional) – if True, load the coverage from the bigdata directory
lum_key (str, optional) – luminosity key, either of “_flux_density” or “_mag”, default is “_flux_density”
interactive (bool, optional) – if True, return the figures and axes, otherwise close them
save (bool, optional) – if True, save the figures
nbins (int, optional) – number of bins for the histograms
- Returns:
if interactive, return the figures and axes, otherwise close them
- Return type:
list[tuple[matplotlib.figure.Figure, matplotlib.axes.Axes]]
- make_submit_file(job_ids: (<class 'int'>, typing.List[int]), node_memory: str = '8G', mask_by_position: bool = False)[source]¶
Produces the submit file that will be submitted to the NPX cluster.
- Parameters:
job_ids (int or list of ints) – The job ID or list of job IDs to submit
node_memory (str) – The amount of memory to request for each node
mask_by_position (bool) – if True mask single exposures that are too far away from the bulk
- plot_lc(parent_sample_idx, service='tap', plot_unbinned=False, plot_binned=True, interactive=False, fn=None, ax=None, save=True, lum_key='flux_density', load_from_bigdata_dir=False, **kwargs)[source]¶
Make a pretty plot of a lightcurve
- Parameters:
parent_sample_idx (int or str) – The index in the parent sample of the lightcurve
service (str) – the service with which the lightcurves were downloaded
plot_unbinned (bool) – plot unbinned data
plot_binned (bool) – plot binned lightcurve
interactive (bool) – interactive mode
fn (str) – filename, defaults to </path/to/timewise/data/dir>/output/plots/<base_name>/<parent_sample_index>_<lum_key>.pdf
ax – pre-existing matplotlib.Axis
save (bool) – save the plot
lum_key (str) – the unit of luminosity to use in the plot, either of ‘mag’, ‘flux_density’ or ‘luminosity’
load_from_bigdata_dir (bool) – load from the the big data storage directory
kwargs – any additional kwargs will be passed on to matplotlib.pyplot.subplots()
- Returns:
the matplotlib.Figure and matplotlib.Axes if interactive=True
- run_cluster(node_memory, service)[source]¶
Run the DESY cluster
- Parameters:
node_memory (str) – memory per node
service (str) – service to use for querying the data
- submit_to_cluster(node_memory, single_chunk=None, mask_by_position=False)[source]¶
Submit jobs to cluster
- Parameters:
node_memory (str) – memory per node
single_chunk (int) – number of single chunk to run on the cluster
mask_by_position (bool) – if True mask single exposures that are too far away from the bulk
- Returns:
ID of the cluster job
- Return type:
int
Point Source Utils¶
- timewise.point_source_utils.get_point_source_wise_data(base_name, ra, dec, min_sep_arcsec=10, match=False, **kwargs)[source]¶
Get a WISEData instance for a point source
- Parameters:
base_name (str) – base name for storage in the data directory
ra (float) – right ascencion
dec (float) – declination
min_sep_arcsec (float) – search radius in arcsec
match (bool) – match to AllWISE Source Catalogue
kwargs (dict) – keyword arguments passed to WISEData.get_photometric_data()
- Returns:
WISEData