2D Histogram Tutorial

This notebook shows an example of how to use the Hist2dQuadTree class in astroQTpy.

Simple 2D Gaussian

First, we’ll create some mock data drawn from two indepedent normal distributions.

[1]:
import numpy as np

N_samples = 10_000
x = np.random.normal(0, 1.0, N_samples)
y = np.random.normal(0, 1.0, N_samples)

Now import Hist2dQuadTree from the quadtree module in astroQTpy and create a new instance.

[2]:
from astroqtpy.quadtree import Hist2dQuadTree
[3]:
guassian_2d_histogram = Hist2dQuadTree(
    -5, 5, -5, 5,
    N_points=10,  # subdivide grid if node has more than this number of points...
    max_depth=6   # ...until the quadtree reaches this depth.
)

Now add the fake data to the quadtree object.

[4]:
guassian_2d_histogram.add_data(x, y)

Plot it!

[5]:
import matplotlib.pyplot as plt
[6]:
fig, ax = plt.subplots(figsize=(7,5))
ax.set_xlim(-5, 5)
ax.set_ylim(-5, 5)

# plot data
ax.scatter(x, y, marker='.', s=1, c='k', alpha=0.2, zorder=99)

# plot the quadtree
hist2d = guassian_2d_histogram.draw_tree(ax, cmap='viridis', vmin=0, vmax=150)
plt.colorbar(hist2d, ax=ax, label='frequency', extend='max')
ax.set_xlabel('x')
ax.set_ylabel('y')

plt.show()
../_images/tutorials_hist2d_tutorial_10_0.png

Gaia H-R Diagram

As an example of something more specific to astronomy, let’s make an HRD using some Gaia data and astroQTpy.

First, we’ll need to pull some stellar data from the Gaia archive. We can do that with astroquery.

[7]:
from astroquery.gaia import Gaia
[8]:
import time

def get_gaia_query(q):
    start = time.time()
    job = Gaia.launch_job_async(q)
    print(f"Total time: {time.time()-start:0.2f} sec")
    return job.get_results()

For this example, we’ll keep our query simple. Let’s just take the first 10,000 sources from the astrophysical_parameters table that have calculated luminosity, Teff, mass, and logg.

[9]:
my_query = """
    SELECT TOP 10000 log10(lum_flame) AS log_lum,
        log10(teff_gspphot) AS log_teff,
        mass_flame AS mass,
        logg_gspphot AS log_g
    FROM gaiadr3.astrophysical_parameters
    WHERE lum_flame IS NOT NULL
        AND teff_gspphot IS NOT NULL
        AND mass_flame IS NOT NULL
        AND logg_gspphot IS NOT NULL
"""
[10]:
# run query and print first 5 rows
gaia_results = get_gaia_query(my_query)
gaia_results[:5]
INFO: Query finished. [astroquery.utils.tap.core]
Total time: 3.07 sec
[10]:
Table length=5
log_lumlog_teffmasslog_g
solMasslog(cm.s**-2)
float64float64float32float32
1.27496424006726033.69238044266600961.75113323.0866
-0.152090919189545433.7591347545585730.99978153.8079
0.071553419051317253.748905174665250.91446224.3056
-1.13276414942489813.58048630650728050.5540744.815
-0.50584477476263463.6778015924622630.76554534.6798

Let’s plot the Gaia data on an H-R Diagram.

[11]:
fig, ax = plt.subplots(figsize=(7,5))
ax.set_xlim(3.5, 4.2)
ax.set_ylim(-1.9, 3.9)

ax.scatter(gaia_results['log_teff'], gaia_results['log_lum'], marker='.', s=1, c='k', alpha=0.5)
ax.set_xlabel('$\log T_{eff}$/K')
ax.set_ylabel('$\log L / L_\odot$')

ax.invert_xaxis()
../_images/tutorials_hist2d_tutorial_18_0.png

Now we can use an astroQTpy quadtree to resctructure the data and create some useful visualizations. Let’s go ahead and import the Hist2dQuadTree class and create a new instance, like we did before. Then, we’ll add the Gaia data.

[12]:
hrd_2d_histogram = Hist2dQuadTree(
    3.5, 4.2, -1.9, 3.9,
    N_points=5,   # subdivide grid if node has more than this number of points...
    max_depth=8   # ...until the quadtree reaches this depth.
)

# add data
hrd_2d_histogram.add_data(gaia_results['log_teff'].value.data, gaia_results['log_lum'].value.data)

First, let’s plot the data and quadtree without shading in the nodes (the plot can start looking a bit busy depending on the maximum depth, number of points per node, etc.).

[13]:
fig, ax = plt.subplots(figsize=(8,8))
ax.set_xlim(3.5, 4.2)
ax.set_ylim(-1.9, 3.9)

# plot data
ax.scatter(gaia_results['log_teff'], gaia_results['log_lum'], marker='.', s=1, c='k', alpha=0.2, zorder=99)

# plot the quadtree
hist2d = hrd_2d_histogram.draw_tree(ax, show_colors=False, cmap='viridis', vmin=0)
ax.set_xlabel('$\log T_{eff}$/K')
ax.set_ylabel('$\log L / L_\odot$')

ax.invert_xaxis()

plt.show()
../_images/tutorials_hist2d_tutorial_22_0.png

Now let’s visualize the HRD a slightly different way, this time turning back on the colors, but removing the node borders and data points. This time we get a plot that looks a bit cleaner!

[14]:
fig, ax = plt.subplots(figsize=(10,8))
ax.set_xlim(3.5, 4.2)
ax.set_ylim(-1.9, 3.9)

# plot the quadtree
hist2d = hrd_2d_histogram.draw_tree(ax, show_lines=False, cmap='viridis', vmin=0, vmax=45)
plt.colorbar(hist2d, ax=ax, label='frequency', extend='max')
ax.set_xlabel('$\log T_{eff}$/K')
ax.set_ylabel('$\log L / L_\odot$')

ax.invert_xaxis()

plt.show()
../_images/tutorials_hist2d_tutorial_24_0.png

Instead of counting the frequency of points within each node, we can select different bin statistics to visualize useful quantities. For example, we can plot the average \(\log g\) of stars within each grid cell. To do this, we just need to pass a third data array to the quadtree object and choose a node statistic.

[15]:
hrd_2d_histogram = Hist2dQuadTree(
    3.5, 4.2, -1.9, 3.9,
    N_points=5,   # subdivide grid if node has more than this number of points...
    max_depth=8,  # ...until the quadtree reaches this depth.
    node_statistic='mean'
)

hrd_2d_histogram.add_data(
    gaia_results['log_teff'].value.data,
    gaia_results['log_lum'].value.data,
    gaia_results['log_g'].value.data
)

Let’s plot the HRD again, this time showing the points and colors representing average \(\log g\).

[16]:
fig, ax = plt.subplots(figsize=(10,8))
ax.set_xlim(3.5, 4.2)
ax.set_ylim(-1.9, 3.9)

# plot data
ax.scatter(gaia_results['log_teff'], gaia_results['log_lum'], c='w', marker='.', s=1, alpha=0.2, zorder=99)

# plot the quadtree
hist2d = hrd_2d_histogram.draw_tree(ax, show_lines=False, cmap='magma', vmin=0, vmax=5)
plt.colorbar(hist2d, ax=ax, label='$\langle \log g \\rangle$', extend='max')
ax.set_xlabel('$\log T_{eff}$/K')
ax.set_ylabel('$\log L / L_\odot$')

ax.invert_xaxis()

plt.show()
../_images/tutorials_hist2d_tutorial_28_0.png