Machine Learning in Archaeology: Applications, History, and Challenges

admin
Aug 9, 2023
7 min read

Updated: May 7

Machine learning has moved from a fringe tool in archaeological research to one of its fastest-growing methodologies. This article covers how that shift happened, which methods and subfields are most affected, what datasets researchers work with, and where the approach still falls short.

Revolutionizing Archaeology - Digging into the past with Machine Learning

The History of Machine Learning in Archaeology

Machine learning has been applied to archaeological research since the early 1990s, but 80% of all published work on the subject appeared after 2018, according to a review of 135 papers spanning 1997 to 2022 (Bellat et al., 2025). The field developed across three phases.

The earliest precursors appeared in the 1960s and 1970s, when researchers such as Lewis Binford applied multivariate statistical analysis to lithic assemblages. Early neural networks and decision trees followed in the 1980s and 1990s for site prediction tasks. The 2010s brought GIS infrastructure and remote sensing data that made large, structured datasets available for the first time.

From 2019 onward, publications grew by 82% from one two-year period to the next, and artificial neural network adoption increased fourfold. Between January 2023 and September 2024 alone, researchers identified 278 unique records on the topic — 12% more than the entire 2021–2022 total.

Why ML adoption accelerated after 2018

Several factors drove this shift:

Large annotated image datasets became publicly available for training computer vision models on satellite and aerial imagery
Deep learning frameworks (TensorFlow, PyTorch) lowered the technical barrier for non-computer-science researchers
Remote sensing platforms such as Google Earth Engine made terabyte-scale geospatial data accessible without local infrastructure
Archaeological journals began accepting computational methods papers more regularly

How Machine Learning is Used in Archaeology

Archaeological research generates large volumes of heterogeneous data: satellite images, soil composition records, artifact photographs, LiDAR point clouds, and historical texts. Machine learning algorithms are trained on this data to detect patterns that would take human analysts much longer to identify manually.

The approach is used alongside traditional fieldwork, not as a replacement. Researchers use ML to narrow down areas worth excavating, classify finds more consistently, and process archival material at scale.

Applications by subfield

Remote sensing and site detection

ML models trained on satellite and aerial imagery can identify surface signatures of buried structures — subtle variations in soil color, vegetation density, and topography. These signals are often invisible to the naked eye but detectable when a model has been trained on confirmed site locations.

One of the best-documented examples is the use of machine learning algorithms to locate lost Mayan cities in the dense jungles of Central America. Models trained on multispectral satellite data flagged anomalies that led researchers to previously undiscovered sites.

Side-by-side satellite images show a forest, one in color with a blue circle, the other in grayscale with a red circle. Google Earth text visible.

A study by Bachagha et al. (2023) demonstrated this at scale in North Africa. The researchers built a workflow integrating SAR and Pleiades satellite imagery with spatial analysis in Google Earth Engine, using a random forest classifier to automate the detection of fortified Roman-period sites in Tunisia. The model identified both known and previously unrecorded fortifications with high precision, and its design was specifically tested for applicability in arid, data-sparse environments.

Artifact classification

Computer vision models — primarily convolutional neural networks (CNNs) — are used to classify artifacts from photographs. Tasks include typological classification of ceramics, lithics, and coins, as well as damage assessment and provenance analysis.

Rows of gray terracotta warrior statues stand in earthen pits, depicting an ancient army in a historical site. The mood is solemn and majestic.

Research by Qiang Zhao (2021) demonstrated the application of decision tree and gradient boosting algorithms to identify artifact sites across China. The study reported 98% classification accuracy and found that a majority of sites clustered near ancient harbors and in the South China region.

Artifact restoration and text recovery

Researchers have used ML to virtually unroll fragile ancient scrolls that would be destroyed by physical handling. The same approach applies to reconstruction tasks: reassembling broken pottery from fragment photographs, completing gaps in damaged inscriptions, and deciphering degraded manuscripts.

NLP and text mining techniques are also being applied to excavation reports and archaeological literature to extract structured data from unstructured text — a task that would take years to do manually across large archives.

Bioarchaeology and taphonomy

In bioarchaeology, ML is used to classify skeletal remains, estimate age and sex from bone morphology, and identify pathological features. Taphonomic analysis — which involves interpreting how bones were modified after death — has been approached with supervised classification models trained on experimental assemblages.

Predictive modeling and landscape archaeology

Predictive models use environmental and geospatial variables (slope, soil type, proximity to water, elevation) to estimate the probability of archaeological deposits across unexcavated landscapes. These models help survey teams allocate limited fieldwork budgets more efficiently.

Dominant Algorithms in Archaeological ML

The following algorithms appear most frequently across published studies:

Random forests: used extensively in remote sensing and site detection tasks due to their robustness with high-dimensional geospatial data
Convolutional neural networks (CNNs): the standard approach for image classification tasks including artifact typology and satellite feature detection
Decision trees and gradient boosting: applied in artifact classification and site prediction where interpretability matters
Support vector machines (SVMs): common in earlier (pre-2018) studies for classification tasks with limited training data
Recurrent neural networks and transformer models: used in NLP tasks applied to archaeological texts and excavation reports

The Bellat et al. (2025) review found that artificial neural networks (encompassing CNNs and related architectures) saw fourfold adoption growth between 2019 and 2022, making them the fastest-growing method category in the field.

Datasets and benchmarks used in archaeological ML

Most archaeological ML studies rely on a combination of publicly available geospatial datasets and purpose-built annotated corpora. The following are among the most referenced:

Sentinel-1 and Sentinel-2 (ESA Copernicus) — free multispectral and SAR satellite imagery, used in remote sensing studies across multiple continents
Pleiades imagery (Airbus) — very high-resolution optical satellite data, used in the Bachagha et al. Tunisia study; access is commercial and represents a barrier for many research teams
Google Earth Engine — not a dataset itself, but the primary platform for processing large-scale geospatial data in ML pipelines
LiDAR open datasets — varying by region; used for detecting earthworks, mounds, and buried structures in forested environments
ArSe (Archaeological Semantic Enrichment) — a structured dataset for testing NLP extraction from archaeological literature
Experimentally derived osteological datasets — used in bioarchaeology and taphonomy classification studies; typically built and published by individual research groups

A consistent finding across the literature is that annotated, standardized datasets for archaeological ML are limited. Most studies construct their own training data from a combination of known site coordinates and open geospatial imagery.

Challenges and Limitations of ML in Archaeology

ML adoption in archaeology faces several structural and methodological constraints.

Data scarcity and annotation cost

Archaeological datasets are small by machine learning standards. Confirmed site coordinates, annotated artifact photographs, and labeled skeletal data are expensive to produce and rarely shared across institutions. Most published models are trained on hundreds to low thousands of labeled examples — far below the volumes that produce robust generalization in other domains.

Spatial and temporal transferability

A model trained on Roman-period sites in North Africa may not transfer to Bronze Age sites in Southeast Asia. Regional geology, land use, vegetation type, and archaeological traditions all affect the surface signatures that models learn to detect. Published models are frequently validated on held-out subsets of the same region and period they were trained on, which inflates apparent accuracy.

Access to high-resolution imagery

Very High-Resolution (VHR) satellite data — which resolves features below 1 meter — is often necessary for precise site detection. Commercial access costs are prohibitive for academic research groups, particularly in low-resource settings. Several published studies explicitly list this as a limiting factor.

Reproducibility and methodological standardization

The Bellat et al. (2025) review identified limited reproducibility across studies: different groups use different preprocessing pipelines, train/test split strategies, and evaluation metrics, making direct comparisons unreliable. There is no equivalent of the benchmark datasets and shared evaluation protocols that structure progress in computer vision or NLP.

Interpretability

Random forests and gradient boosting models offer some feature importance output, but CNN-based models offer limited interpretability. For archaeological applications — where the goal is not just prediction but understanding — the inability to explain what a model learned is a meaningful constraint.

Exploring the Past and Future with Machine Learning

The use of machine learning in archaeology is a striking example of how technology can transform even the most traditional fields. Machine learning is not just about future-forward applications; it can also help us delve into and better understand our past.

Intrigued by these surprising applications of machine learning? At our company, we're dedicated to harnessing the power of machine learning to drive insights and innovation, whether that's unearthing lost civilizations or helping your business grow.

Contact us to learn more about how machine learning can revolutionize your operations, or to simply explore the intriguing world of machine learning. Our team of experienced data scientists and consultants are eager to guide you on your machine learning journey. So why wait? Your machine learning adventure awaits. Dig into the future with us today!

Explore AI & Machine Learning

Frequently Asked Questions

What is machine learning in archaeology?

Machine learning in archaeology refers to the use of ML algorithms to analyze archaeological data, including satellite imagery, artifact photographs, and site records. It helps researchers detect patterns, identify potential dig sites, and process data at a scale that manual analysis cannot match.

How does AI help find archaeological sites?

AI models are trained on satellite images and remote sensing data to detect surface-level changes associated with human activity, such as shifts in soil color or vegetation density. These models can scan large areas of land and flag locations worth investigating, reducing the time and cost of field surveys.

Can machine learning restore ancient artifacts?

Yes. ML has been used to virtually unroll fragile scrolls, reconstruct broken pottery, and complete gaps in ancient texts. Computer vision techniques in particular are well-suited for artifact analysis, as they can process and interpret visual data from damaged or incomplete objects.

What datasets are used for machine learning in archaeology?

Commonly used datasets include Sentinel-1/2 imagery from ESA Copernicus, Google Earth Engine for geospatial processing, Pleiades commercial imagery for high-resolution tasks, and purpose-built annotated datasets assembled by individual research teams. Standardized, shared benchmark datasets remain rare in the field.

What machine learning algorithms are most used in archaeology?

Random forests, convolutional neural networks (CNNs), decision trees, gradient boosting, and support vector machines are the most referenced methods across the published literature. CNNs have grown fastest since 2019, particularly in artifact classification and remote sensing applications.

BI Solusi is your trusted partner for data-driven success in Indonesia, serving companies in the Southeast Asia region and beyond. We specialize in implementing cutting-edge Data Analytics, Business Intelligence platform, and Big Data solution, complemented by expert Data Science services.

We offer flexible nearshore and offshore BI implementation models to meet your specific needs and deliver the highest-quality results.

Our BI Consulting expertise encompasses Data Integration services (ETL), Data Warehousing, and the utilization of Data Visualization tools such as Microsoft Power BI, Qlik Sense, and Tableau for Reports and Dashboards implementation.

Let us help you unlock the full potential of your data and achieve your business goals.