Chapter 5 : Introduction to the Lifecycle of AI

Understanding the lifecycle of an AI system is a crucial first step in analyzing its risks. Each phase of the lifecycle demands unique measures to ensure responsible AI practices.

So what is the Lifecycle of AI?

The AI system lifecycle helps you understand the technical journey of an AI system—from initial planning to operation, monitoring, and sometimes retirement. While the process isn't always linear, each phase carries its own unique risks that can affect your system's performance, compliance, and user trust.

While different organizations and frameworks (like ISO or the OECD) offer slightly different versions of this lifecycle, we’ll stick to the NIST AI Risk Management Framework (RMF) throughout the book for consistency.

But let’s first look at how the OECD defines it.

OECD's Definition of the AI System Lifecycle

According to the OECD:

“The AI system lifecycle involves phases such as: i) Design, data, and models – a context-dependent sequence including planning, data collection, and model building; ii) Verification and validation; iii) Deployment; iv) Operation and monitoring.”

These phases happen iteratively—not always in order. AI systems can also be retired at any point during the operation and monitoring phase.

Remember: Evaluating your AI system's lifecycle is essential for understanding where security, fairness, and data protection risks can arise.

Diagram of lifecycle stages of an AI system AI Terms and Concepts from OECD

A Closer Look at Phases of the AI Lifecycle

Understanding the lifecycle of an AI system is a crucial first step in analyzing its risks. Each phase of the lifecycle demands unique measures to ensure responsible AI practices. Gaining insight into the technical stages of the AI/ML project lifecycle is key to adopting a practice-based approach to AI ethics and safety. Different types of expertise are necessary to evaluate each stage of the AI lifecycle.

Measuring risk at different stages of the AI lifecycle can yield varying results, as some risks may emerge later as AI systems evolve. Different AI actors may have different risk perspectives. For example, an AI developer of pre-trained models may see risks differently than someone deploying the model in a specific context. Deployers might not recognize unique risks associated with their use case. All AI actors share the responsibility for ensuring the AI system is trustworthy and fit for its intended purpose.

Phase 1: Plan & Design

When starting an AI project, the team needs to decide whether to build a model from scratch or adopt an existing one. This depends on available resources, data, existing technologies, and the complexity of the problem. The model choice and model-adoption approach depend on the purpose of the AI system as well as where to acquire the necessary training data to feed the model during training.

Rather than developing a custom algorithmic model, teams can consider options such as:

  • Proprietary Models: Some platforms offer customizable models that can be fine-tuned on specific datasets.

  • Open Source Models: The open-source community contributes significantly to model development and transparency, offering adaptable and well-documented architectures.

  • Cloud-Based Models: Cloud providers offer scalable models that integrate seamlessly into existing infrastructures and services.

The team must still set clear goals, understand stakeholder needs, and assess their capabilities. They should define what problem they're solving and how to measure success. At this point, it’s important to map out the impacted stakeholders who can be directly or indirectly affected by the deployment of an AI system or application.

The AI lifecycle begins with clearly defining the problem and understanding its context, objectives, stakeholders, and data requirements. Risk assessments and impact assessments remain essential to ensure fairness, security, and legal compliance.

Phase 2: Collect and Process Data

The first step after the design stage in the machine learning process is gathering relevant data from various sources. This data needs to be cleaned and transformed to ensure it's ready for analysis, which is crucial for building accurate machine learning models. Data can be collected through methods like web scraping, surveys, or legal agreements to obtain existing datasets.

Once you have your data, it's important to visualize the data and look at summary statistics to understand its structure. Key questions include: Are there missing values? Are there outliers or unexpected data points? Is the data balanced? This analysis helps you understand the data better and can inform further data collection and preparation steps.

Data labeling is another important step, where labels are added to raw data like images, videos, or text to categorize it for easier identification. Cleaning the data is crucial, especially for large datasets, as they often contain missing values or irrelevant information. Removing these helps improve the accuracy of your model and reduces errors and biases.

Types of Data Used in AI Systems

Understanding where your data comes from is key to managing risks and responsibilities in AI development. Here’s a breakdown of the five common types of data inputs used in training and operating AI systems based on OECD:

  • Expert Input: This includes structured human knowledge, like ontologies, knowledge graphs, or analytic rules (e.g., objective functions or reward signals). Think of this as codified expertise—rules the model uses to make sense of the world.

  • Provided Data: Data that people or organizations knowingly supply. This could be initiated (like filling out a license application), transactional (such as payment records), or posted (like a social media update).

  • Observed Data: Captured through sensors or monitoring tools—think GPS location, website clicks, or temperature readings. This can be:

    • Engaged: like consenting to cookie tracking,

    • Unanticipated: like measuring how long you stare at a specific image, or

    • Passive: like CCTV footage in a public space.

  • Synthetic Data: Artificially generated through simulations or models. This helps replicate rare or expensive scenarios—like crash testing a self-driving car virtually. Synthetic data mimics reality but is built algorithmically.

  • Derived Data: Created by transforming existing data into something new. Examples include credit scores or risk ratings. These can be inferred (using probability models) or aggregated (from more granular inputs). Often, derived data is also proprietary.

Data Splits

Splitting your data correctly is key to training a successful machine learning model and avoiding a common problem called overfitting—this is when a model learns the data by heart instead of learning to understand different patterns.

To prevent this, you need to divide your data into three main parts:

Training Data: This should be the largest portion of your data. You use this to teach the model what to look for and how to make predictions.

Test Data: This is the second largest slice of your data. After training the model, you use this set to check how well it's performing and to make any necessary adjustments.

Validation Data: This is the smallest part of your data set and is used last. Once your model does well with the test data, you use the validation data to see how it handles completely new information.

It’s important to mix the data up randomly to avoid any bias, except in special cases like time series data where the order matters. This method helps ensure your model can handle real-world tasks and not just the examples it was trained on.

Data sheets:

In order to govern models effectively, we need to understand what we’re feeding them—especially in the training data context. That’s why it’s crucial to keep an inventory of the datasets used during training and record key metadata about them.

To standardize this process, think of datasheets for datasets like nutrition labels for food. They help achieve transparency about the model by documenting the types of data used, data volume, copyright information, and other essential details about the underlying training sets.

This not only satisfies the curiosity (and concerns) of end users and regulators—it’s also a major step toward accountability.

Phase 3: Build and Use Model

Before an AI system can actually do anything useful, it needs to be trained — that’s just a fancy way of saying it learns from examples.

This happens in a controlled setting, like a lab or development environment. Think of it like teaching a kid using flashcards. You show the system lots of examples — texts, numbers, pictures — and it starts to learn patterns.

For example, let’s say you want to fine-tune a model to help diagnose medical conditions from X-ray images. You’d feed it a bunch of X-rays, each labeled with a diagnosis — like pneumonia, fracture, or normal. The algorithm (which is just a smart set of rules) begins to notice what pneumonia typically looks like, how a fracture appears, and so on. Over time, the model gets better at telling these apart on its own.

Once it’s trained, the model can look at new images and make a guess — just like a student who’s learned from enough flashcards and can now answer questions without help.

This whole process is often called “training” or “optimisation.” You decide what you want the model to do (like spot lung issues) and how you’ll check if it’s doing it well (such as accuracy or speed).

AI systems that use machine learning are everywhere now. Instead of giving the AI step-by-step instructions, machine learning lets it learn on its own by spotting patterns in data. You usually show the system a lot of examples with the right answers already labeled — like medical scans marked with the correct diagnosis — and it figures out the patterns.

Sometimes, you don’t give it any labeled answers. Instead, you let the system explore and learn by trial and error, kind of like learning how to ride a bike. This is useful when you don’t know the “right answer” for every example.

There are lots of ways to train models — from simple methods like decision trees and basic equations to more complex ones like deep neural networks (which mimic how our brains work). Whether your data is labeled or not changes how you train your model — but we covered that earlier in the section on data and metadata.

Model Transparency: Why It Matters

Once the model is built, it’s important for people to understand how it works — especially when it’s being used in sensitive or high-risk areas like health or hiring.

That’s where model cards come in. These are like fact sheets for AI models. They help explain what the model does well, where it might struggle, and how it should (or shouldn’t) be used.

Model cards are useful for both technical teams and non-technical users. Developers can use them to build better apps, while end users — or even regulators — can see how the model was trained, what data it used, and what risks to watch out for.

They’re also key for spotting problems like bias. For example, if a model works well for one group of people but not for others — say, it’s more accurate for one skin tone than another — model cards help flag that. This kind of transparency encourages developers to think about fairness and accountability from day one and to keep those values in mind throughout the AI system’s lifecycle.

Model cards are quickly becoming a standard part of responsible AI development — and in many cases, they’re required for legal compliance, like under the EU AI Act.

Phase 4: Verify and Validate

Ensuring your AI models work as expected is crucial. First, you need to train your model with data and then test it with new, unseen data to check its performance. This helps you see if the model can make accurate predictions outside of what it was directly taught. Fine-tuning the model using validation sets allows for adjustments to high-level aspects like hyperparameters, much like refining a recipe for the perfect taste. After these steps, evaluating the model by checking various performance metrics and documenting the process is essential for transparency and accountability.

📏 What metrics should we follow?

Here’s a quick and friendly breakdown of common metrics people use to check how well an AI system is performing:

  • Accuracy: Think of this as the “Did it get it right?” score. It's great for stuff like quizzes or yes/no questions, but for more creative tasks (like writing) — these compare the AI’s text to a “correct” version to see how close it gets.

  • Precision: This one’s picky. It asks, “Out of all the times the model said something was right, how often was it actually right?” Good for making sure your AI isn’t too trigger-happy.

  • Recall: Now we ask, “Did the AI miss anything important?” It looks at how good the model is at catching all the relevant stuff — even if it occasionally grabs the wrong thing.

  • F1 Score: Can’t decide between precision and recall? The F1 score is like their lovechild — it balances both and gives you a single score to care about.

No need to get lost in math here. Just remember:

  • High precision = less nonsense

  • High recall = less missed stuff

  • High F1 = a good mix of both

Overfitting and underfitting

  • Overfitting Training for too many iterations on a limited dataset can make the model overly specialized, leading to poor performance on unseen data.

  • Underfitting:Inadequate training or overly simple models may fail to capture the complexity of the task, resulting in general inaccuracies.

Validation ensures the model performs well and meets its goals, involving several steps: quality assurance, safety and compliance, and transparency. It guarantees the AI system produces high-quality results, operates safely, and meets compliance standards. Additionally, it provides clear information to stakeholders, reassuring them about the system’s reliability.

Training data contains biases (e.g., societal, cultural, or linguistic biases), the model may replicate or amplify these biases in its outputs. Understanding how your model makes decisions is vital. Tools like LIME and SHAP help explain the model’s predictions, identifying any biases or errors and ensuring fairness. By training, testing, fine-tuning, evaluating, and comparing models, you develop reliable AI systems that stakeholders can trust. This thorough validation process builds confidence in the AI's transparency, safety, and performance.

Phase 5: Deploy and Use

An AI model can be used in many different ways, and “inference” is the process of using an AI model –trained from data or manually encoded – to derive a prediction, recommendation or other outcome based on new data that the model was not trained on. People who will use the system must be trained to understand how it works, explain its decisions in simple terms, and ensure the outputs are high-quality, reliable, and fair.

Deterministic vs. Probabilistic Models

AI systems often rely on different modeling approaches depending on the type and certainty of available data. Two major types you'll encounter are:

  • Deterministic Models: These operate based on fixed rules—think “if this, then that.” They always produce the same output given the same input. There's no ambiguity here; it's a predictable, one-outcome scenario.

  • Probabilistic Models: These work in fuzzier territory. They explore multiple potential explanations for the data and calculate the likelihood of each. That means both the model and its outcomes come with a degree of uncertainty—but this uncertainty is measurable.

Probabilistic models allow you to optimize for performance metrics like confidence, robustness, or risk, using different inferencing techniques depending on your use case.

Training users is essential because it helps them use the system effectively. Without proper training, users might trust the AI too much and ignore their own judgment. For example, someone using an AI system might follow its recommendations without questioning them, even when they don't seem right. In safety-critical areas, users might stick to AI recommendations out of fear of making mistakes, which can lead to problems.

After the AI model is developed and tested, it’s time for deployment. This means setting up the model to work with other applications and be used by business users. The process includes training staff, changing workflows, and keeping the system updated to handle new data and changing business needs. Continuous maintenance and adjustments ensure the AI system remains effective and accurate.

Phase 6: Operate and Monitor

Once your AI model is out in the real world, your job isn’t over. You’ve got to keep checking in to make sure it’s still doing what it’s supposed to do—and doing it responsibly.

Over time, things change. Maybe the data starts looking different, or users behave in new ways. This can cause something called model drift—basically, your AI starts getting things wrong because it’s working with data that feels unfamiliar. That’s why regular check-ups are key.

You should:

  • Watch how the model is performing.

  • Log what it’s doing (what data it’s using, what outputs it’s giving).

  • Collect feedback from users.

  • Track any weird or unintended behavior.

Sometimes, if performance drops too much, you’ll need to go back to earlier steps—like re-training the model or even rethinking parts of the design. In more serious cases, it might mean retiring the model completely.

🛠️ Logging and record-keeping aren’t just helpful—they’re legally important too. Especially under rules like the EU AI Act, you need to keep proper logs and show that the AI is safe, fair, and working as promised.

⚠️ Keep in mind: some logs might include sensitive data (like user info), so you’ll need to handle that carefully to avoid leaks or misuse.

💡 Think of monitoring as your AI’s regular health check. It’s how you keep it sharp, ethical, and aligned with your goals over time.

Phase 6: De-commissioning & Retirement

At some point, you might decide to retire an AI system — maybe it’s outdated, not working as expected, or just no longer needed. But you can’t just unplug and forget about it.

Before shutting things down, make sure you follow rules like GDPR and CCPA. That means safely deleting or archiving personal data tied to the AI. If you skip this step, you could leave serious privacy holes behind.

When an AI system goes offline, it might impact people who rely on it. For example, in healthcare, shutting down a diagnosis model could confuse patients and doctors. Always give people a heads-up and make sure there’s a backup plan. There’s more to this than flipping a switch. You need to handle data responsibly, think about bias and fairness, take ownership of the decision, and be transparent. That way, you keep trust intact and avoid causing harm.

In short: AI offboarding needs care too. Do it right, and you show that your organization treats AI — and its impact on people — seriously.

Last updated

Was this helpful?