Simulation, Machine Learning, and their Synergy

in Addressing Past Socio-Environmental Systems

available at https://andros-spica.github.io/IAMAHA-Angourakis-2023/
https://andros-spica.github.io/IAMAHA-Angourakis-2023/index.html?print-pdf (printable version)

reform.ressourcencampus-bochum.de

Outline

Simulation versus machine learning?
Simulation + machine learning → examples
Implications to modelling past socio-ecological systems
Conclusions

1. Simulation versus machine learning?

An "Age" of machine learning

Deep learning made Artificial Intelligence pop (again)

Data Science, Big Data and low-cost/high-performance computing

Simulation associated to "old" Artificial Intelligence"

Modelling is a fitness contest? (pun intended)

When to use: purposes and context

Simulation in archaeology:
"Can you validate your model?"*

Epstein, J. M. (2008). Why Model? Journal of Artificial Societies and Social Simulation, 11(4), 12.

AI hype vs reality — Flovik, V. (2023, January 25).
Machine Learning: From hype to real-world applications. Medium. (Link)

how similar or different are Machine Learning and simulation models?

the
Epistemological context

Diagrams available at https://github.com/Andros-Spica/modelling-simulation-graphs
Icon assets from "The Noun Project" (thenounproject.com)

how similar or different are Machine Learning and simulation models?

the
Epistemological context

Diagrams available at https://github.com/Andros-Spica/modelling-simulation-graphs
Icon assets from "The Noun Project" (thenounproject.com)

	Input	Model	Output
Machine Learning	known	learned after implementation	known+ predicted after training
Simulation	known+ assumed	known before implementation	learned after iterations

2. Simulation + Machine Learning

Moving forward...

If both approaches serve different epistemological purposes, why should we choose one over the other?

How can simulation and machine learning be combined into the same methodology?

Icon assets from "The Noun Project" (thenounproject.com)

von Rueden, L., Mayer, S., Sifa, R., Bauckhage, C., & Garcke, J. (2020). Combining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions. In M. R. Berthold, A. Feelders, & G. Krempl (Eds.), Advances in Intelligent Data Analysis XVIII (Vol. 12080, pp. 548–560). Springer International Publishing. 10.1007/978-3-030-44584-3_43

Simulation for
the creation of training datasets

Jareño, S. J. N., Helden, D. P. van, Mirkes, E. M., Tyukin, I. Y., & Allison, P. M. (2021). Learning from Scarce Information: Using Synthetic Data to Classify Roman Fine Ware Pottery. Entropy, 23(9). 10.3390/e23091140

5373 images of 162 terra sigilata from Roman Britain
A classification problem
Simulation (or procedural generation) of 1000+ images per class using Matplotlib and Blender (experimentation)
Taking advantage of deep learning, while minimising curse of dimensionality

Simulation datasets are (potentially) "Big Data"

Jareño et al. 2021 - Fig. 8 — Jareño et al. 2021, Fig. 8 and 9

Jareño et al. 2021 - Fig. 9 — Jareño et al. 2021, Fig. 8 and 9

Machine Learning for
model generation

Immense (and growing) collection of Large Language Models (LLMs) of various kinds and domains
e.g. ChatGPT, Google Bard, GitHub Copilot, etc.
Conversational technique as your own "think thank"
Variable fluency with programming languages
e.g. Python > NetLogo
Potential of using Web and local files data
Technologies in "beta" - e.g. AgentGPT
Iterative process still requiring a lot of know-how both on simulation and ML

You are not alone anymore!

Machine Learning for
pattern detection in simulation output

Less new and hype, yet more important
Realm of exploratory and inferential statistics
e.g. linear models, multidimensional scaling (PCA, CA, etc), clustering (hierarchical, k-means, etc), Bayesian statistics, etc.
Good, fairly consolidated practice:
sensitivity analysis with Random Forest (or similar) on randomised exploration of parameters
Many sources available (publications, libraries/packages, scripts)
Always remember the nature of simulation output (e.g. account for model and experiment biases)

After 10,000 runs...
the headache comes

Angourakis et al. 2017 - Fig. 7 — Angourakis, A., Alcaina-Mateos, J., Madella, M., & Zurro, D. (2022). Human-Plant Coevolution: A modelling framework for theory-building on the origins of agriculture.
PLOS ONE, 17(9), e0260904.
https://doi.org/10.1371/journal.pone.0260904

Angourakis et al. 2022 - Fig. 7 — Angourakis, A., Alcaina-Mateos, J., Madella, M., & Zurro, D. (2022). Human-Plant Coevolution: A modelling framework for theory-building on the origins of agriculture.
PLOS ONE, 17(9), e0260904.
https://doi.org/10.1371/journal.pone.0260904

Machine Learning for
parameter calibration and optimisation

Old problem, but still presenting challenges
Effective parameter calibration is one of the missing link between "toy model" and "digital twin"
Several quite consolidated algorithms for linear and non-linear models
e.g., those available with optim() in R, Genetic Algorithms in OpenMOLE, etc.
Challenges come with number of parameters and stochasticity

Threading an invisible, multidimensional landscape

parameter fitting demo — Demonstration of parameter optimisation problems when fitting target data with stochastic models
top: double logistic model (5 pars) fitted to (unknown) model output;
middle: escalonated double logistic curve (7 pars.) fitted to (unknown) model output;
bottom: escalonated double logistic curve (7 pars., 14 hyperpars.) (guess based on target aggregate stats) versus target data (green)

Machine Learning for
model selection + pattern detection in output + parameter calibration and optimisation

Carrignon, S., Brughmans, T., & Romanowska, I. (2020). Tableware trade in the Roman East: Exploring cultural and economic transmission with agent-based modelling and approximate Bayesian computation. PLOS ONE, 15(11), e0240414. 10.1371/journal.pone.0240414

Trade of tableware from eastern Mediterranean (Hellenistic and Roman periods)
8730 fragments, 5 types, 178 sites (presence/absence per type-site)
A cultural evolution problem
Simulation of cultural transmission algorithms (three hypotheses) coupled with a market economy model, producing spatial distributions of cultural traits of traders of each settlement
Stochastic exploration of parameter space for each algorithm, evaluated in light of empirical data, through Approximate Bayesian Computation (ABC) + Population Monte Carlo (ABCPMC)

Cane turns into cannon!

Carrignon et al. 2020 - Fig. 2 — Carrignon et al. 2020, Fig. 2 and 5

Carrignon et al. 2020 - Fig. 5 — Carrignon et al. 2020, Fig. 2 and 5

See also:
Carrignon, S., Bentley, R. A., & Ruck, D. (2019). Modelling rapid online cultural transmission:
Evaluating neutral models on Twitter data with approximate Bayesian computation.
Palgrave Communications, 5(1), Article 1. https://doi.org/10.1057/s41599-019-0295-9

Machine Learning for
surrogate model generation

Input-output relationship in mechanistic models is approximated with non-mechanistic (low fidelity) and computationally cheaper models (e.g. polynomials) (Forrester et al. 2008)
Higher simulation speed and lower computational costs necessary for:
- more extensive parameter/scenario exploration
- multi-objective calibration and optimisation
- increased model scale and complexity
- simulation using real-time data (e.g. digital twins)
Machine learning models can:
- build surrogate models given pre-defined building blocks (e.g. Greig & Arranz 2021)
- serve as surrogates themselves, i.e. trained to predict simulation runs with limited amount of simulation data (e.g. Angione et al. 2022; Pfrommer et al. 2018)
Caveat/pitfall: justified when there is high confidence on mechanism underlying model behaviour

KIDS meets KISS

Angione, C., Silverman, E., & Yaneske, E. (2022). Using machine learning as a surrogate model for agent-based simulations. PLOS ONE, 17(2), e0263150. https://doi.org/10.1371/journal.pone.0263150

Machine Learning for
simulation model components

Not strictly new in distributed simulation methods
Further exploring ABM roots and use in robotics
(Multi-Agent Systems)
Many learning algorithms directly compatible with ABM, e.g.:
- reinforced learning (e.g. Angourakis et al. 2015)
- evolutionary algorithms
  (e.g. Sarker & Ray 2010, Li et al. 2023)
- neural networks!
Several ABM frameworks are compatible with ML
e.g. Belief-Desire-Intention agent architectures, OpenAI experiments with emergent agent behaviour
Many topics friendly to both simulation and ML components
e.g. collective action, cultural evolution, markets, etc.
Caveat/pitfall: optimal behaviour might not coincide with the historical/target behaviour

Simulation (artificial) intelligence

screenshot of OpenAI emergent behaviour example

3. Implications

Implications for simulation
of past socio-ecological systems

More domains, true transdisciplinary (design)
Faster workflow (implementation, debugging, optimisation)
Higher computability (size)
Better calibration (input data)
Better validation tests (output data)
More expressive results
(analysis & visualisation)

⇛ Tractability of complex processes

Caveat: higher computational costs, expertise requirements, hybrid teams desirable

Examples from Indus Village model (to-do list)

Angourakis et al. 2022, graphical abstract

Angourakis et al. 2022, Quaternary | model repository: https://github.com/Andros-Spica/indus-village-model

⇑ simplification of Land Water model (runoff calculation)
(surrogate model generation)

use of ML component in agent decision-making ⇒
(model component)
sensitivity analysis per each submodel ⇒
(pattern detection)

⇐ validation of Weather and Land model to known present conditions
(parameter calibration and optimisation)
⇐ Conversion of paleoenvironmental and paleodemographic data (proxies) to model (mechanistic) variables
(pre-processing/selecting input data) ⇓

Andros-Spica/diagrams
/RoadMapSoFar_2022-06.png

4. Conclusions

Leonardo AI take on ML and simulation cooperation

A few takes of Leonardo.AI (Diffusion XL) on
"machine learning and simulation cooperating
to reconstruct socio-ecological past".

Mechanism/explanation is the keystone of simulation models
Don't just compare simulation to ML (fitness fever), combine and complement
Simulation + ML:
many avenues available (low-hanging fruits) and others still to explore
Past socio-ecological systems call for modelling for the long-haul (deep-time, complex, multi-disciplinary)

Thank you for the attention!

Andreas Angourakis | @AndrosSpica

reform.ressourcencampus-bochum.de