8 Databases (I)

8.1 Introduction to Databases in Archaeology

What is a Database?
- Definition of databases in archaeology.
  - A database is an organized collection of data that is stored and accessed electronically. In archaeology, databases are crucial for storing, retrieving, and managing vast amounts of data collected during excavations and research. They help archaeologists efficiently organize and analyze information, ensuring that valuable data is preserved and easily accessible for future studies.
- Importance of Databases in Archaeology:
  1. Data Organization: Databases allow archaeologists to systematically organize data, making it easier to manage and retrieve information. This is essential for maintaining detailed records of archaeological findings.
  2. Data Preservation: Databases ensure that archaeological data is preserved in a digital format, protecting it from physical degradation and loss. This is particularly important for long-term research projects.
  3. Data Sharing: Databases facilitate data sharing among researchers, enabling collaboration and the exchange of information. This enhances the overall quality of archaeological research.
  4. Data Analysis: Databases support complex data analysis, allowing archaeologists to identify patterns, trends, and relationships within the data. This aids in the interpretation of archaeological findings.
- Example: Common Uses of Databases for Managing Archaeological Data:
  1. Artifacts: Databases can store detailed information about artifacts, including descriptions, photographs, dimensions, materials, and provenance. This allows archaeologists to catalog and track artifacts systematically. For example, an artifact database might include fields for the artifact’s type, material, date of discovery, and associated excavation site.
  2. Excavation Records: Databases can manage excavation records, such as site plans, stratigraphic layers, context sheets, and field notes. This helps in documenting the excavation process and maintaining a comprehensive record of findings. For example, an excavation database might include fields for the site name, excavation date, stratigraphic unit, and associated artifacts.
  3. Research Data: Databases can store research data, including radiocarbon dates, chemical analyses, and spatial data from Geographic Information Systems (GIS). This facilitates data analysis and interpretation. For example, a research database might include fields for sample ID, analysis type, date, and results.
  4. Bibliographic References: Databases can manage bibliographic references and literature reviews, helping archaeologists keep track of relevant publications and sources. For example, a bibliographic database might include fields for the author, title, publication year, and abstract.
Types of Databases
- Overview of relational vs. non-relational databases.
  1. Relational Databases: Relational databases use tables to store data and relationships between data. They follow the relational model, where data is organized into tables with rows and columns. Each table represents a specific entity, and relationships between tables are established using keys (primary and foreign keys). Relational databases use SQL (Structured Query Language) for querying and managing data. Examples include MySQL, PostgreSQL, Oracle, and Microsoft SQL Server.
    - Advantages:
      1. Data Integrity: Relational databases enforce data integrity through constraints and relationships between tables. This ensures that data is consistent and accurate.
      2. Structured Data: Relational databases are well-suited for structured data, such as artifact catalogs and excavation records.
      3. Complex Queries: Relational databases support complex queries using SQL, enabling detailed searches and analyses.
      4. Scalability: Relational databases can handle large datasets and support concurrent access by multiple users.
      5. Data Security: Relational databases provide robust security features, including user authentication, access controls, and encryption.
  2. Non-Relational Databases: Non-relational databases, also known as NoSQL databases, are designed for unstructured or semi-structured data. They can handle large volumes of diverse data types and are often used for big data and real-time web applications. NoSQL databases include various types, such as document stores (e.g., MongoDB), key-value stores (e.g., Redis), column-family stores (e.g., Cassandra), and graph databases (e.g., Neo4j).
    - Advantages:
      1. Flexibility: Non-relational databases can store unstructured and semi-structured data, making them suitable for diverse data types.
      2. Scalability: Non-relational databases are designed to scale horizontally, handling large volumes of data across distributed systems.
      3. Performance: Non-relational databases can provide high performance for specific use cases, such as real-time data processing and big data analytics.
      4. Schema-less Design: Non-relational databases do not require a fixed schema, allowing for more flexible data modeling.
- Example: Advantages of Relational Databases for Structured Archaeological Data:
  1. Data Integrity: Relational databases enforce data integrity through constraints and relationships between tables. This ensures that data is consistent and accurate, which is essential for maintaining reliable archaeological records. For example, a relational database can enforce referential integrity by ensuring that every artifact record references a valid excavation site.
  2. Structured Data: Relational databases are well-suited for structured data, such as artifact catalogs and excavation records. The tabular format allows for easy organization and retrieval of data. For example, an artifact table can have columns for artifact ID, type, material, and date of discovery.
  3. Complex Queries: Relational databases support complex queries using SQL, enabling archaeologists to perform detailed searches and analyses. For example, an archaeologist can query the database to find all artifacts made of a specific material from a particular excavation layer or to calculate the average age of artifacts from a specific site.
  4. Scalability: Relational databases can handle large datasets and support concurrent access by multiple users. This is important for collaborative archaeological projects where multiple researchers need to access and update the database simultaneously. For example, a relational database can support multiple archaeologists entering data from different excavation sites in real-time.
  5. Data Security: Relational databases provide robust security features, including user authentication, access controls, and encryption. This ensures that sensitive archaeological data is protected from unauthorized access. For example, a relational database can restrict access to certain tables or fields based on user roles, ensuring that only authorized personnel can view or modify sensitive data.

In summary, databases are essential tools in archaeology for managing and analyzing data. Relational databases, in particular, offer significant advantages for handling structured archaeological data, ensuring data integrity, and supporting complex queries. Non-relational databases provide flexibility and scalability for handling diverse and large volumes of data, making them suitable for specific use cases in archaeology.

8.2 Relational Database Concepts

Basic Concepts
- Introduction to Tables, Records, Fields, Primary and Foreign Keys:
  - Tables: In a relational database, data is organized into tables. Each table represents a specific entity, such as artifacts, excavation sites, or researchers. Tables consist of rows and columns.
  - Records: A record (or row) in a table represents a single instance of the entity. For example, a record in an artifacts table might represent a single artifact.
  - Fields: A field (or column) in a table represents a specific attribute of the entity. For example, fields in an artifacts table might include artifact ID, type, material, and date of discovery.
  - Primary Keys: A primary key is a unique identifier for each record in a table. It ensures that each record can be uniquely identified. For example, an artifact ID can serve as the primary key in an artifacts table.
  - Foreign Keys: A foreign key is a field in one table that links to the primary key in another table. It establishes a relationship between the two tables. For example, a site ID in an artifacts table can serve as a foreign key linking to the primary key in a sites table.
- Example: Organizing Excavation Data in Relational Tables:
  1. Site Locations Table:
    - Fields: Site ID (Primary Key), Site Name, Location, Excavation Date
    - Example Record: (1, “Site A”, “Location A”, “2023-01-01”)
  2. Stratigraphy Table:
    - Fields: Stratigraphy ID (Primary Key), Site ID (Foreign Key), Layer, Description
    - Example Record: (1, 1, “Layer 1”, “Topsoil layer”)
  3. Finds Table:
    - Fields: Find ID (Primary Key), Site ID (Foreign Key), Stratigraphy ID (Foreign Key), Artifact Type, Material, Date of Discovery
    - Example Record: (1, 1, 1, “Pottery”, “Clay”, “2023-01-02”)
By organizing excavation data into relational tables, archaeologists can efficiently manage and retrieve information about sites, stratigraphy, and finds.
Normalization
- Introduction to Database Normalization: Database normalization is the process of organizing data in a database to avoid redundancy and ensure data integrity. The goal is to structure the data in such a way that each piece of information is stored only once, reducing the risk of inconsistencies and improving data accuracy.
  - Normalization Principles:
    1. First Normal Form (1NF): Ensures that each field contains only atomic (indivisible) values and that each record is unique.
    2. Second Normal Form (2NF): Ensures that all non-key fields are fully dependent on the primary key.
    3. Third Normal Form (3NF): Ensures that all non-key fields are not only dependent on the primary key but also independent of each other.
Example: Structuring Artifact Data into Normalized Tables:
1. Artifacts Table:
  - Fields: Artifact ID (Primary Key), Artifact Type ID (Foreign Key), Material ID (Foreign Key), Condition ID (Foreign Key), Date of Discovery
  - Example Record: (1, 1, 1, 1, “2023-01-02”)
2. Artifact Types Table:
  - Fields: Artifact Type ID (Primary Key), Artifact Type
  - Example Record: (1, “Pottery”)
3. Materials Table:
  - Fields: Material ID (Primary Key), Material
  - Example Record: (1, “Clay”)
4. Conditions Table:
  - Fields: Condition ID (Primary Key), Condition
  - Example Record: (1, “Intact”)
By normalizing artifact data into separate tables for artifact types, materials, and conditions, archaeologists can avoid redundancy and ensure data integrity.
Entity-Relationship (ER) Models
An Entity-Relationship (ER) model is a visual representation of the relationships between different entities in a database. It helps in designing the database structure by illustrating how entities are connected and how data flows between them.

Components of an ER Model:
1. Entities: Represented by rectangles, entities are objects or concepts that have data stored about them. For example, “Site,” “Stratigraphy,” and “Finds” are entities.
2. Attributes: Represented by ovals, attributes are properties or characteristics of entities. For example, “Site Name” and “Location” are attributes of the “Site” entity.
3. Relationships: Represented by diamonds, relationships show how entities are connected. For example, a “Finds” entity might have a relationship with both the “Site” and “Stratigraphy” entities.
Example: Designing an ER Diagram for an Archaeological Database:
1. Entities:
  - Site (Attributes: Site ID, Site Name, Location, Excavation Date)
  - Stratigraphy (Attributes: Stratigraphy ID, Site ID, Layer, Description)
  - Finds (Attributes: Find ID, Site ID, Stratigraphy ID, Artifact Type, Material, Date of Discovery)
2. Relationships:
  - A “Site” can have multiple “Stratigraphy” layers (One-to-Many relationship).
  - A “Stratigraphy” layer can have multiple “Finds” (One-to-Many relationship).
  - A “Find” is associated with one “Site” and one “Stratigraphy” layer (Many-to-One relationship).
ER Diagram:
```
+----------------+          +----------------+          +------------------+  
|      Site      |          |  Stratigraphy  |          |       Finds      |  
|----------------|          |----------------|          |------------------|  
| Site ID (PK)   |<-------->| Site ID (FK)   |<-------->| Site ID (FK)     |  
| Site Name      |          | Stratigraphy ID|          | Stratigraphy ID  |  
| Location       |          | Layer          |          | Find ID (PK)     |  
| Excavation Date|          | Description    |          | Artifact Type    |  
+----------------+          +----------------+          | Material         |  
                                                        | Date of Discovery|  
                                                        +------------------+  
```
In this ER diagram, the “Site” entity is related to the “Stratigraphy” entity through the “Site ID” field, and the “Stratigraphy” entity is related to the “Finds” entity through the “Stratigraphy ID” field. This visual representation helps in understanding the relationships between different archaeological datasets and designing a well-structured database.

8.3 Introduction to SQL (Structured Query Language)

Basic SQL Commands
- - Overview of SQL syntax and common commands:
  1. CREATE: Used to create a new table or database. Syntax: “CREATE TABLE table_name (column1 datatype, column2 datatype, …);”
  2. INSERT: Used to insert new records into a table. Syntax: “INSERT INTO table_name (column1, column2, …) VALUES (value1, value2, …);”
  3. SELECT: Used to retrieve data from a table. Syntax: “SELECT column1, column2, … FROM table_name;”
  4. UPDATE: Used to modify existing records in a table. Syntax: “UPDATE table_name SET column1 = value1, column2 = value2, … WHERE condition;”
  5. DELETE: Used to delete records from a table. Syntax: “DELETE FROM table_name WHERE condition;”
- Example: Creating tables for archaeological data and inserting records.
  1. Creating a Table for Sites:
```
CREATE TABLE Sites (
  SiteID INT PRIMARY KEY, 
  SiteName VARCHAR(100), 
  Location VARCHAR(100), 
  ExcavationDate DATE
);
```
  2. Inserting Records into the Sites Table:
```
INSERT INTO Sites (SiteID, SiteName, Location, ExcavationDate)
VALUES (1, 'Site A', 'Location A', '2023-01-01');
```
    .

Creating a Database with SQL

Create the Database:

CREATE DATABASE ArchaeologyDB;

Use the Database:

USE ArchaeologyDB;

Create different Tables:

Artifact Table

CREATE TABLE Artifacts (
  ArtifactID INT PRIMARY KEY,
  ArtifactType VARCHAR(50),
  Material VARCHAR(5`),
  Context VARCHAR(100)
);

Sites Table

CREATE TABLE Sites (
  SiteID INT PRIMARY KEY,
  SiteName VARCHAR(100),
  Location VARCHAR(100),
  ExcavationDate DATE
);

Insert Records:

Insert into Artifacts Table:

INSERT INTO Artifacts (
  ArtifactID, 
  ArtifactType,
  Material, Context)
VALUES (1, 'Pottery', 'Clay', 'Layer 1');

Insert in Sites Table:

INSERT INTO Sites (SiteID, SiteName, Location, ExcavationDate)
VALUES (1, 'Site A', 'Location A', '2023-01-01');

Indexing and Keys
- Explanation of Primary Keys, Foreign Keys, and Indexing for Optimizing Database Performance:
  1. Primary Keys: A primary key is a unique identifier for each record in a table. It ensures that each record can be uniquely identified.
    - Example: ArtifactID in the Artifacts table.
  2. Foreign Keys: A foreign key is a field in one table that links to the primary key in another table. It establishes a relationship between the two tables.
    - Example: SiteID in the Artifacts table linking to SiteID in the Sites table.
  3. Indexing: Indexes are used to speed up the retrieval of data. They create a data structure that allows for faster searches.
    - Example: Creating an index on the ArtifactType field in the Artifacts table.
- Defining keys to link excavation sites to finds in different tables:
  1. Creating Foreign Key Relationship:
```
ALTER TABLE Artifacts 
ADD SiteID INT,
ADD CONSTRAINT FK_Site,
FOREIGN KEY (SiteID) REFERENCES Sites(SiteID);
```
  2. Creating an Index:
```
CREATE Index idx_ArtifactType
ON Artifacts (ArtifactType);
```
Basic Querying with SQL

Writing Basic Queries to Retrieve Data from a Database:
1. Select All Records:
```
SELECT * FROM Artifacts
```
2. Select Specific Columns:
```
SELECT ArtifactID, ArtifactType, Material FROM Artifacts
```
Example: Using SELECT statements to extract information about finds or excavation layers:
1. Retrieve All Artifacts:
```
SELECT * FROM Artifacts
```
2. Retrieve Artifacts of a Specific Type:
```
SELECT ArtifactID, ArtifactType, Material
FROM Artifacts
WHERE ArtifactType = 'Pottery'
```
Filtering and Sorting Data

Using WHERE, ORDER BY, and GROUP BY Clauses to Filter and Sort Data:
1. WHERE Clause: Used to filter records based on a condition.
  - Example:
```
SELECT * FROM Artifacts WHERE Material = 'Clay'
```
2. ORDER BY Clause: Used to sort the result set by one or more columns.
  - Example:
```
SELECT * FROM Artifacts ORDER BY DateOfDiscovery DESC
```
3. GROUP BY Clause: Used to group records that have the same values in specified columns.
  - Example:
```
SELECT Material, COUNT(*) FROM Artifacts GROUP BY Material
```
Querying Artifacts by Material Type or Sorting Excavation Records by Date:
1. Filter Artifacts by Material Type:
```
SELECT * FROM Artifacts
WHERE Material = 'Clay'
```
2. Sort Excavation Records by Date:
```
SELECT * FROM Sites
ORDER BY ExcavationDate DESC
```

Joining Tables

Using JOIN Statements to Combine Related Tables:

INNER JOIN: Combines records from two tables where there is a match in both tables.
- Syntax:
```
SELECT columns FROM table1 INNER JOIN table2 ON table1.column = table2.column
```
LEFT JOIN: Returns all records from the left table and matched records from the right table.
- Syntax:
```
SELECT columns FROM table1 LEFT JOIN table2 ON table1.column = table2.column
```
RIGHT JOIN: Returns all records from the right table and matched records from the left table.
- Syntax:
```
SELECT columns FROM table1 RIGHT JOIN table2 ON table1.column = table2.column
```

Writing Queries to Retrieve Artifacts Based on Their Excavation Context:

Join Artifacts with Sites:

SELECT Artifacts.ArtifactID, Artifacts.ArtifactType, Sites.SiteName, Sites.Location
FROM Artifacts
INNER JOIN Sites ON Artifacts.SiteID = Sites.SiteID

Join Artifacts with Stratigraphy:

SELECT Artifacts.ArtifactID, Artifacts.ArtifactType, Stratigraphy.Layer, Stratigraphy.Description
FROM Artifacts
INNER JOIN Stratigraphy ON Artifacts.StratigraphyID = Stratigraphy.StratigraphyID

8.4 Database Tools for Archaeology

Accessing Databases from R
- Introduction to R packages (DBI, RSQLite, RPostgres) for interacting with databases.
- Example: Connecting to an SQLite or PostgreSQL database from R for archaeological data analysis.
SQLite for small projects
- Introduction to SQLite as a lightweight database solution for archaeological projects.
- Example: Creating a simple SQLite database for site data using R (RSQLite, e.g. https://cran.r-project.org/web/packages/RSQLite/vignettes/RSQLite.html) or Python.
PostgreSQL for larger projects
- Overview of PostgreSQL for larger archaeological datasets.
- Example: Setting up a PostgreSQL database in R for managing excavation records.
Querying Databases in R
- Writing SQL queries in R and retrieving data for further analysis.
- Example: Querying an excavation database from R and visualizing the results with ggplot2.
Data Wrangling and Cleaning
- Using R to clean and manipulate database data for analysis.
- Example: Using dplyr in R to filter and transform queried data.

Hands-on Practice

Building a Small Archaeological Database
- Step-by-step walkthrough of creating a database schema for a hypothetical excavation project (using RSQLite).
- Example: Creating tables for stratigraphy, finds, and context.
Inserting and Managing Data
- Practical examples of adding and managing archaeological data.
- Example: Inserting excavation records into the database using SQL.
Q&A and Troubleshooting
- Addressing challenges in database design and SQL queries.