Analysis tools

Here we will outline some of the available tools that can help inspect the performance of the emulator. The example data will be the Schecter function example:

import swiftemulator as se
from swiftemulator.emulators import gaussian_process
import numpy as np

def log_schecter_function(log_M, log_M_star, alpha):
    M = 10 ** log_M
    M_star = 10 ** log_M_star
    return np.log10( (1 / M_star) * (M / M_star) ** alpha * np.exp(- M / M_star ))

model_specification = se.ModelSpecification(
    number_of_parameters=2,
    parameter_names=["log_M_star","alpha"],
    parameter_limits=[[11.,12.],[-1.,-3.]],
    parameter_printable_names=["Mass at knee","Low mass slope"],
)

log_M_star = np.random.uniform(11., 12., 100)
alpha      = np.random.uniform(-1., -3., 100)

modelparameters = {}
for unique_identifier in range(100):
    modelparameters[unique_identifier] = {"log_M_star": log_M_star[unique_identifier],
                                        "alpha": alpha[unique_identifier]}

model_parameters = se.ModelParameters(model_parameters=modelparameters)

modelvalues = {}
for unique_identifier in range(100):
    independent = np.linspace(10,12,10)
    dependent = log_schecter_function(independent,
                                    log_M_star[unique_identifier],
                                    alpha[unique_identifier])
    dependent_error = 0.02 * dependent
    modelvalues[unique_identifier] = {"independent": independent,
                                    "dependent": dependent,
                                    "dependent_error": dependent_error}

model_values = se.ModelValues(model_values=modelvalues)

schecter_emulator = gaussian_process.GaussianProcessEmulator()
schecter_emulator.fit_model(model_specification=model_specification,
                            model_parameters=model_parameters,
                            model_values=model_values)

Cross checks

To set up a cross check, the emulator is trained on all but one of the input data-sets. The resulting emulator can then be compared against the model that was left out.

Cross checks are the main way of quantifying emulator performance in the absence of validation data. When emulating via cosmological simulations it is likely to be very expensive to generate a validation dataset of sufficient size. for cases like this SWIFT-Emulator has an easy way of setting up cross-checks.

The swiftemulator.sensitivity.cross_check() object acts identically to swiftemulator.emulators.gaussian_process() and takes the same inputs. By setting the cross-checks up in this way you can directly compare the results with the main GP that you use for predictions.

from swiftemulator.sensitivity import cross_check

schecter_ccheck = cross_check.CrossCheck()
schecter_ccheck.build_emulators(model_specification=model_specification,
                        model_parameters=model_parameters,
                        model_values=model_values)

In this case build_emulators takes the place of fit_model. Note that build_emulators now creates N independent trained emulators, where N is the number of models, so this can take quite a long time. For this example the amount of models was reduced from 100 to 20.

Once the emulators have been build there are some inherent tools to have a look at the result (see swiftemulator.sensitivity.cross_check()). We will use build_mocked_model_values_original_independent() to compare the cross-check predictions with the original data.

import matplotlib.pyplot as plt

data_by_cc = schecter_ccheck.build_mocked_model_values_original_independent()

for unique_identifier in range(20):
    cc_over_og = data_by_cc[unique_identifier]["dependent"] / \
                model_values[unique_identifier]["dependent"]
    plt.plot(data_by_cc[unique_identifier]["independent"],cc_over_og)
    plt.xlabel("Mass")
    plt.ylabel("Cross-check / Truth")

plt.savefig("Cross_check_accuracy.png",dpi=200)

Just with a few line we are able to quantify how accurate the emulator is. Also note that any ModelValues container can be parsed as if it is a dictionary.

Sweeps Of Parameter Space

One of the advantages of using emulators is that it supplies you with a fully continuous model of the given function. Besides fitting the parameters it is often interesting to see the effect of changing a single parameter, by doing a sweep.

This is implemented into the SWIFT-Emulator with swiftemulator.mocking.mock_sweep().

from swiftemulator.mocking import mock_sweep

center = {"log_M_star": 11.5, "alpha": -2.0}

Mock_values, Mock_parameters = mock_sweep(schecter_emulator
                                      ,model_specification
                                      ,6,"alpha",center)

for mock_name in Mock_values.keys():
    plt.plot(Mock_values[mock_name]["independent"],
            Mock_values[mock_name]["dependent"],
            label = "Alpha = " +str(Mock_parameters[mock_name]["alpha"])[:4])

plt.xlabel("Stellar mass")
plt.ylabel("dn/dlogM")
plt.legend()
plt.savefig("parameter_sweep.png",dpi=200)

mock_sweep returns the values and parameter of the sweep as ModelValues and ModelParameters containers, that are easy to parse.

Model Parameters Features

This highlights two small functions that are attached to the swiftemulator.backend.model_parameters() object. The first is the ability to generate a quick plot of the experimental design using corner.

model_parameters.plot_model(model_specification)

Note that the axis label used here are the one passed to the model specification. This can be used to have a quick look at whether your space is well sampled.

After finding a set of best fit model parameters it is sometimes usefull to see if there are any individual model that has similar values. find_closest_model takes a dictionary of input values and finds the training model that is closets to those values.

best_model = {"log_M_star": 11.3, "alpha": -2.1}

model_parameters.find_closest_model(best_model,number_of_close_models=5)

which outputs

([2, 12, 18, 19, 3],
[{'log_M_star': 11.26347510702813, 'alpha': -1.9614226414699145},
{'log_M_star': 11.507944778215956, 'alpha': -1.9818583963792449},
{'log_M_star': 11.19527147203741, 'alpha': -1.8330160108907092},
{'log_M_star': 11.033961506507945, 'alpha': -2.275313906753826},
{'log_M_star': 11.67912812994198, 'alpha': -2.0664526312834353}])

It returns a list with the unique_identifier of each close model, and the model parameters belonging to that model. This can be used to explore the models close to you best fit model, for example to check how well sampled that part of parameter space is.

Checking Hyperparameters

In general one should not look at the hyperparameters. They should only be used as a diagnostic when the emulator is giving strange results. The SWIFT-Emulator provides an easy way to check the parameterspace of the hyperparameters. The hyperparameters are optimised to using the marginalised likelihood, so we can inspect how well converged they are by looking at the probability distribution of each individual hyperparameter. This is done via swiftemulator.emulators.gaussian_process_mcmc(). In this case MCMC implies the use of Markov chian Monte Carlo (via emcee) to find the best hyperparameters, allowing us to look at the complete parameter space.

from swiftemulator.emulators import gaussian_process_mcmc
schecter_emulator_mcmc = gaussian_process_mcmc.GaussianProcessEmulatorMCMC(burn_in_steps=1
                                                                          ,mcmc_steps=1000)
schecter_emulator_mcmc.fit_model(model_specification=model_specification,
                        model_parameters=model_parameters,
                        model_values=model_values)

schecter_emulator_mcmc.plot_hyperparameter_distribution()

This method is a lot slower than the default hyperparameter optimisation, and may take some time to compute. The main take away from plots like this is to see whether the hyperparameters are converged, and whether they are consistent with the faster optimisation method.