Simulation, Machine Learning, and their Synergy

in Addressing Past Socio-Environmental Systems

Andreas Angourakis | @AndrosSpica

available at https://andros-spica.github.io/IAMAHA-Angourakis-2023/
https://andros-spica.github.io/IAMAHA-Angourakis-2023/index.html?print-pdf (printable version)
ReForm logo
reform.ressourcencampus-bochum.de
Leipzig logo   RUB logo
Logos                 MPJ logo     UzK logo  

Outline

  1. Simulation versus machine learning?
  2. Simulation + machine learning →  examples
  3. Implications to modelling past socio-ecological systems
  4. Conclusions

1. Simulation versus machine learning?

An "Age" of machine learning

  • Deep learning made Artificial Intelligence pop (again)

  • Data Science, Big Data and low-cost/high-performance computing

  • Simulation associated to "old" Artificial Intelligence"

  • Modelling is a fitness contest? (pun intended)

  • When to use: purposes and context

  • Simulation in archaeology:
    "Can you validate your model?"*

    Epstein, J. M. (2008). Why Model? Journal of Artificial Societies and Social Simulation, 11(4), 12.

AI hype vs reality
Flovik, V. (2023, January 25).
Machine Learning: From hype to real-world applications. Medium. (Link)

how similar or different are Machine Learning and simulation models?

the
Epistemological context

models to math models types of math models

Diagrams available at https://github.com/Andros-Spica/modelling-simulation-graphs
Icon assets from "The Noun Project" (thenounproject.com)


how similar or different are Machine Learning and simulation models?

the
Epistemological context

types of math models types of math models

Diagrams available at https://github.com/Andros-Spica/modelling-simulation-graphs
Icon assets from "The Noun Project" (thenounproject.com)

Input Model Output
Machine Learning known learned
after implementation
known+
predicted
after training
Simulation known+
assumed
known
before implementation
learned
after iterations

2. Simulation + Machine Learning

Moving forward...

If both approaches serve different epistemological purposes, why should we choose one over the other?

How can simulation and machine learning be combined into the same methodology?

simulation and ML

Icon assets from "The Noun Project" (thenounproject.com)

von Rueden, L., Mayer, S., Sifa, R., Bauckhage, C., & Garcke, J. (2020). Combining Machine Learning and Simulation to a Hybrid Modelling Approach: Current and Future Directions. In M. R. Berthold, A. Feelders, & G. Krempl (Eds.), Advances in Intelligent Data Analysis XVIII (Vol. 12080, pp. 548–560). Springer International Publishing. 10.1007/978-3-030-44584-3_43

Simulation for
the creation of training datasets

Jareño, S. J. N., Helden, D. P. van, Mirkes, E. M., Tyukin, I. Y., & Allison, P. M. (2021). Learning from Scarce Information: Using Synthetic Data to Classify Roman Fine Ware Pottery. Entropy, 23(9). 10.3390/e23091140

  • 5373 images of 162 terra sigilata from Roman Britain
  • A classification problem
  • Simulation (or procedural generation) of 1000+ images per class using Matplotlib and Blender (experimentation)
  • Taking advantage of deep learning, while minimising curse of dimensionality

Simulation datasets are (potentially) "Big Data"

Jareño et al. 2021 - Fig. 8 Jareño et al. 2021 - Fig. 9
Jareño et al. 2021, Fig. 8 and 9


Machine Learning for
model generation

  • Immense (and growing) collection of Large Language Models (LLMs) of various kinds and domains
    e.g. ChatGPT, Google Bard, GitHub Copilot, etc.
  • Conversational technique as your own "think thank"
  • Variable fluency with programming languages
    e.g. Python > NetLogo
  • Potential of using Web and local files data
  • Technologies in "beta" - e.g. AgentGPT
  • Iterative process still requiring a lot of know-how both on simulation and ML

You are not alone anymore!

chatGPT example AgentGPT example


Machine Learning for
pattern detection in simulation output

  • Less new and hype, yet more important
  • Realm of exploratory and inferential statistics
    e.g. linear models, multidimensional scaling (PCA, CA, etc), clustering (hierarchical, k-means, etc), Bayesian statistics, etc.
  • Good, fairly consolidated practice:
    sensitivity analysis with Random Forest (or similar) on randomised exploration of parameters
  • Many sources available (publications, libraries/packages, scripts)
  • Always remember the nature of simulation output (e.g. account for model and experiment biases)

After 10,000 runs...
the headache comes

Angourakis, A., Alcaina-Mateos, J., Madella, M., & Zurro, D. (2022). Human-Plant Coevolution: A modelling framework for theory-building on the origins of agriculture.
PLOS ONE, 17(9), e0260904.
https://doi.org/10.1371/journal.pone.0260904
Angourakis et al. 2017 - Fig. 7 Angourakis et al. 2022 - Fig. 7
Angourakis, A., Salpeteur, M., Martínez Ferreras, V., Gurt Esparraguera, J. M., Ferreras, V. M., & Gurt Esparraguera, J. M. (2017). The Nice Musical Chairs Model: Exploring the Role of Competition and Cooperation Between Farming and Herding in the Formation of Land Use Patterns in Arid Afro-Eurasia.
Journal of Archaeological Method and Theory, 24(4), 1177–1202.
https://doi.org/10.1007/s10816-016-9309-8


Machine Learning for
parameter calibration and optimisation

  • Old problem, but still presenting challenges
  • Effective parameter calibration is one of the missing link between "toy model" and "digital twin"
  • Several quite consolidated algorithms for linear and non-linear models
    e.g., those available with optim() in R, Genetic Algorithms in OpenMOLE, etc.
  • Challenges come with number of parameters and stochasticity

Threading an invisible, multidimensional landscape

parameter fitting demo
Demonstration of parameter optimisation problems when fitting target data with stochastic models
top: double logistic model (5 pars) fitted to (unknown) model output;
middle: escalonated double logistic curve (7 pars.) fitted to (unknown) model output;
bottom: escalonated double logistic curve (7 pars., 14 hyperpars.) (guess based on target aggregate stats) versus target data (green)


Machine Learning for
model selection + pattern detection in output + parameter calibration and optimisation

Carrignon, S., Brughmans, T., & Romanowska, I. (2020). Tableware trade in the Roman East: Exploring cultural and economic transmission with agent-based modelling and approximate Bayesian computation. PLOS ONE, 15(11), e0240414. 10.1371/journal.pone.0240414

  • Trade of tableware from eastern Mediterranean (Hellenistic and Roman periods)
  • 8730 fragments, 5 types, 178 sites (presence/absence per type-site)
  • A cultural evolution problem
  • Simulation of cultural transmission algorithms (three hypotheses) coupled with a market economy model, producing spatial distributions of cultural traits of traders of each settlement
  • Stochastic exploration of parameter space for each algorithm, evaluated in light of empirical data, through Approximate Bayesian Computation (ABC) + Population Monte Carlo (ABCPMC)

Cane turns into cannon!



Carrignon et al. 2020 - Fig. 2
Carrignon et al. 2020 - Fig. 5
Carrignon et al. 2020, Fig. 2 and 5

See also:
Carrignon, S., Bentley, R. A., & Ruck, D. (2019). Modelling rapid online cultural transmission:
Evaluating neutral models on Twitter data with approximate Bayesian computation.
Palgrave Communications, 5(1), Article 1. https://doi.org/10.1057/s41599-019-0295-9


Machine Learning for
surrogate model generation

  • Input-output relationship in mechanistic models is approximated with non-mechanistic (low fidelity) and computationally cheaper models (e.g. polynomials) (Forrester et al. 2008)
  • Higher simulation speed and lower computational costs necessary for:
    • more extensive parameter/scenario exploration
    • multi-objective calibration and optimisation
    • increased model scale and complexity
    • simulation using real-time data (e.g. digital twins)
  • Machine learning models can:
  • Caveat/pitfall: justified when there is high confidence on mechanism underlying model behaviour

KIDS meets KISS


Machine Learning for
simulation model components

Simulation (artificial) intelligence

3. Implications

Implications for simulation
of past socio-ecological systems

  • More domains, true transdisciplinary (design)
  • Faster workflow (implementation, debugging, optimisation)
  • Higher computability (size)
  • Better calibration (input data)
  • Better validation tests (output data)
  • More expressive results
    (analysis & visualisation)


⇛ Tractability of complex processes

Caveat: higher computational costs, expertise requirements, hybrid teams desirable

Examples from Indus Village model (to-do list)

Angourakis et al. 2022, graphical abstract

Angourakis et al. 2022, Quaternary | model repository: https://github.com/Andros-Spica/indus-village-model

⇑ simplification of Land Water model (runoff calculation)
(surrogate model generation)

use of ML component in agent decision-making ⇒
(model component)
sensitivity analysis per each submodel ⇒
(pattern detection)

⇐ validation of Weather and Land model to known present conditions
(parameter calibration and optimisation)
⇐ Conversion of paleoenvironmental and paleodemographic data (proxies) to model (mechanistic) variables
(pre-processing/selecting input data) ⇓

Andros-Spica/diagrams
								/RoadMapSoFar_2022-06.png

4. Conclusions

Leonardo AI take on ML and simulation cooperation Leonardo AI take on ML and simulation cooperation Leonardo AI take on ML and simulation cooperation Leonardo AI take on ML and simulation cooperation Leonardo AI take on ML and simulation cooperation Leonardo AI take on ML and simulation cooperation

A few takes of Leonardo.AI (Diffusion XL) on
"machine learning and simulation cooperating
to reconstruct socio-ecological past".

  • Mechanism/explanation is the keystone of simulation models
  • Don't just compare simulation to ML (fitness fever), combine and complement
  • Simulation + ML:
    many avenues available (low-hanging fruits) and others still to explore
  • Past socio-ecological systems call for modelling for the long-haul (deep-time, complex, multi-disciplinary)

Thank you for the attention!

Andreas Angourakis | @AndrosSpica

ReForm logo
reform.ressourcencampus-bochum.de
Leipzig logo   RUB logo
Logos                 MPJ logo     UzK logo  
IAMAHA 2023 - Nice
Angourakis
28 November 2023
https://andros-spica.github.io/ IAMAHA-Angourakis-2023/