![]() |
VOOZH | about |
Artificial Intelligence (AI) is the field of computer science that enables machines to perform tasks that typically require human intelligence such as learning, reasoning and problem-solving. It aims to create systems capable of perceiving their environment and making decisions autonomously.
Artificial Intelligence (AI) is a branch of computer science that enables machines to simulate human intelligence. Unlike traditional programming where explicit rules are written for every scenario, AI systems can learn from data, adapt to new situations and make decisions.
Example: A rule-based spam filter uses explicit conditions (if subject contains “free” → mark as spam) while an AI-based spam filter learns patterns from emails and improves over time.
AI can be classified into 3 types based on its capabilities:
1. Narrow AI (Weak AI):
2. General AI (Strong AI):
3. Super AI:
AI can be classified into 4 types based on its functionalities:
1. Reactive Machines:
2. Limited Memory:
3. Theory of Mind:
4. Self-Aware AI:
Let's see the difference between Symbolic AI and Connectionist AI,
| Aspect | Symbolic AI | Connectionist AI |
|---|---|---|
| Definition | AI based on explicit rules and logic to represent knowledge. | AI based on neural networks, learning patterns from data. |
| Knowledge Representation | Uses symbols, facts and logic statements (e.g., “IF…THEN…” rules). | Uses distributed representations across nodes in a network. |
| Learning | Limited learning; mostly pre-programmed rules. | Learns from data; adapts over time. |
| Example | Expert systems, Prolog-based reasoning systems. | Neural networks for pattern recognition, speech or image recognition. |
| Strengths | Good at reasoning, explainable, interpretable. | Good at handling noisy or unstructured data. |
| Limitations | Cannot handle ambiguity well; rigid. | Difficult to interpret; “black-box” behavior. |
Let's see the difference between Parametric and Non-Parametric Models,
| Aspect | Parametric Models | Non-Parametric Models |
|---|---|---|
| Definition | Models with a fixed number of parameters. | Models where number of parameters grows with data. |
| Assumption | Assumes a specific functional form for data distribution. | Makes few or no assumptions about data distribution. |
| Learning | Learns a fixed set of parameters from training data. | Learns data patterns directly from training data. |
| Example | Linear regression, Logistic regression. | k-Nearest Neighbors (k-NN), Decision Trees. |
| Strengths | Efficient, simpler, easier to interpret. | Flexible, can model complex distributions. |
| Limitations | Limited flexibility; may underfit if model is wrong. | Computationally expensive; may overfit with small data. |
An AI agent is an autonomous system or software entity that interacts with its environment to achieve specific objectives. Unlike traditional programs that execute fixed instructions, an AI agent senses the environment, reasons about it and takes actions to maximize a defined goal or utility. The agent operates in a continuous perceive → reason → act → perceive cycle:
Example: A self-driving car,
AI agents can be classified based on how they perceive, reason and act in the environment. Their complexity increases from simple reflex agents to utility-based agents, allowing them to handle more sophisticated tasks:
1. Simple Reflex Agents:
2. Model-Based Reflex Agents:
3. Goal-Based Agents:
4. Utility-Based Agents:
In AI, problem formulation is the process by which an agent defines the task it needs to solve in terms of states, actions, goals and path costs. Proper problem formulation is critical because it determines the efficiency and feasibility of search and decision-making algorithms.
Key Components of Problem Formulation:
1. Initial State:
2. Actions:
3. Transition Model (Successor Function):
4. Goal State:
5. Path Cost:
Search algorithms in AI are used to explore the state space of a problem to find a solution. They can be broadly classified into:
| Aspect | Uninformed Search | Informed Search |
|---|---|---|
| Definition | Explores blindly without extra info about goal | Uses heuristics to guide search toward goal |
| Knowledge | Only knows actions, states and goal | Knows estimated cost to goal (heuristic function) |
| Efficiency | Can be slower; may explore unnecessary paths | Faster; prioritizes likely solutions |
| Example | BFS, DFS, Uniform-Cost Search | Greedy Best-First, A* Search |
1. Breadth-First Search (BFS):
2. Depth-First Search (DFS):
Uniform-Cost Search is an uninformed search algorithm that expands the node with the lowest cumulative path cost from the start node. Unlike BFS which expands nodes level by level, UCS considers the cost of reaching a state, making it more suitable when step costs vary.
How it works:
Properties:
Use Cases:
Example: If traveling between cities where road lengths differ, UCS will find the shortest-distance route, not just the one with fewer hops (like BFS).
Greedy Best-First Search is an informed search algorithm that expands the node which appears to be closest to the goal based on a heuristic function h(n) (an estimate of the cost from node n to the goal).
How it works:
Advantages:
Limitations:
Example: In a map problem, Greedy Search may choose the city that looks closest to the destination “as the crow flies,” but may end up on a longer or blocked route compared to UCS or A*.
The A* (A-star) algorithm is an informed search algorithm used to find the least-cost path from a start node to a goal node. It combines both the actual cost of reaching a state and the estimated cost of reaching the goal from that state into a single evaluation function.
A* balances two components:
1. Path Cost (g(n)):
2. Heuristic Estimate (h(n)):
The combination is expressed as:
Step-by-Step Working of A*
Example: Imagine navigating from City A to City G:
Thus, A* selects paths that are both cheapest so far and promising toward the goal.
Hill Climbing is a heuristic-based optimization algorithm in Artificial Intelligence that belongs to the family of local search methods. It treats problem-solving as a process of searching for the best state in a state space using an evaluation (objective) function.
Thus, Hill Climbing is essentially a greedy search strategy that only looks at the immediate best move, without considering the global structure of the state space.
Local Optima Problems in Hill Climbing
Because Hill Climbing only considers immediate neighbors, it can fail to find the global optimum:
1. Local Maxima/Minima
2. Plateaus
3. Ridges
Examples
1. Hill Climbing: It is a local search algorithm that attempts to find the optimal solution by iteratively moving to a neighboring state with a better evaluation score. However, because it only considers immediate improvements, it often gets trapped in local optima, plateaus or ridges. To overcome these limitations, variants such as stochastic hill climbing and simulated annealing introduce randomness or controlled exploration to help escape suboptimal solutions and approach the global optimum.
2. Simulated Annealing:
Backtracking is a systematic search technique used to solve constraint satisfaction problems. It builds a solution incrementally, one assignment at a time and abandons a candidate (backtracks) as soon as it violates a constraint. By pruning impossible paths early, backtracking efficiently explores the solution space while guaranteeing a valid solution if one exists.
How Backtracking Works:
1. Start with an empty or partial solution.
2. Assign a value to a variable.
3. Check if the assignment satisfies all constraints:
4. Repeat until all variables are assigned or all possibilities are exhausted.
Examples:
Advantages:
Limitations:
Adversarial search is a type of search used in competitive environments where multiple agents (players) have conflicting goals. Unlike standard search problems, the outcome depends not only on the actions of the searching agent but also on the actions of opponents. The goal of adversarial search is to maximize an agent’s advantage while minimizing the opponent’s advantage. This is typical in games such as chess, tic-tac-toe or checkers where one player’s gain is another player’s loss.
How Adversarial Search Works
Example: Tic-Tac-Toe
Example: Chess
1. Minimax algorithm: It is a decision-making algorithm used in adversarial search problems such as games where two players have opposing objectives. It assumes that one player (Max) aims to maximize their utility while the other player (Min) aims to minimize Max’s utility. The algorithm explores the game tree, evaluating all possible moves and counter-moves to determine the optimal strategy for the player.
How Minimax Works
1. Represent the game as a tree of possible moves, where:
2. Evaluate terminal nodes using a utility function (e.g., +1 for win, -1 for loss, 0 for draw).
3. Recursively backpropagate the values:
Example (Tic-Tac-Toe):
2. Alpha-Beta Pruning: Alpha-Beta Pruning is an enhancement of Minimax that reduces the number of nodes evaluated in the game tree by eliminating branches that cannot influence the final decision, improving efficiency without affecting the optimality of the result.
Introduces two values:
While traversing the tree: If , the branch can be pruned (no need to explore further).
Result: Same optimal decision as Minimax but with fewer nodes evaluated which is crucial in games with large state spaces like chess.
Example (Chess): In a complex chess position, Alpha-Beta Pruning allows the program to skip exploring moves that cannot possibly improve the outcome, significantly speeding up decision-making without sacrificing accuracy.
A Constraint Satisfaction Problem (CSP) is a type of problem in Artificial Intelligence where the goal is to find values for a set of variables while satisfying a set of constraints. Unlike standard search problems, CSPs focus on constraints between variables rather than a sequential path. Solving a CSP involves finding an assignment of values to all variables that does not violate any constraints, making it a natural framework for many real-world problems that involve planning, scheduling or configuration.
Types of CSPs
Real-Life Applications
State-space search strategies are fundamental in AI for problem-solving where the goal is to find a sequence of actions that leads from an initial state to a goal state. Forward state-space search begins at the initial state and explores successors until the goal is reached while backward state-space search starts from the goal state and works backward to determine which predecessor states could lead to it. Both strategies systematically explore the problem space but differ in their starting points and the way they expand the search tree.
1. Forward State-Space Search
2. Backward State-Space Search
Comparison:
Local optima are points in the search space where a local search algorithm such as hill climbing, cannot find any neighboring state that improves the evaluation function, even though better solutions exist elsewhere in the space. In other words, the algorithm is “stuck” at a suboptimal peak (or valley for minimization problems) because it only considers immediate neighbors and ignores the global structure of the search space.
Key Points
Example
In search and optimization algorithms, especially in local search and reinforcement learning, exploration and exploitation represent two competing strategies. Exploration involves trying out new, unvisited states or actions to gather more information about the search space. Exploitation, on the other hand, focuses on using the current knowledge to select the best-known options to improve performance. Balancing these two strategies is critical because excessive exploration can waste time on suboptimal paths while excessive exploitation can lead the algorithm to get trapped in local optima or miss better solutions.
1. Exploration:
2. Exploitation:
3. Examples
Hill Climbing / Local Search:
Reinforcement Learning:
Knowledge Representation (KR) in AI is the process of encoding information about the world into a form that a computer system can utilize to solve complex problems. It allows AI systems to reason, infer and make decisions based on stored knowledge. KR is essential because it bridges the gap between raw data and intelligent behavior, enabling machines to understand relationships, constraints and patterns in a structured way. Without effective knowledge representation, AI systems cannot perform reasoning, planning or problem-solving reliably.
| Feature / Aspect | Propositional Logic (PL) | First-Order Logic (FOL) |
|---|---|---|
| Definition | Deals with simple statements (propositions) that are true or false. | Extends PL by including objects, predicates, functions and quantifiers to express relationships between objects. |
| Variables | None | Uses variables to generalize facts and represent objects. |
| Quantifiers | Not supported | Supports universal (∀) and existential (∃) quantifiers. |
| Expressiveness | Limited to simple facts | Highly expressive; can represent relationships and general rules. |
| Complexity | Computationally simpler | More complex due to reasoning over objects, relations and quantifiers. |
| Example Statement | “It is raining.” “If it is raining, then the ground is wet." | → “For all x, if x is a bird, then x can fly.” |
| Feature / Aspect | Forward Chaining | Backward Chaining |
|---|---|---|
| Reasoning Direction | Data-driven (from facts to conclusions) | Goal-driven (from goal to facts) |
| Starting Point | Begins with available facts | Begins with the goal or query |
| When Useful | When all possible conclusions need to be inferred | When a specific goal/query needs to be verified |
| Efficiency | Can generate unnecessary facts; may be slower | Focused on the goal; often more efficient |
| Memory Usage | Requires storing all intermediate inferred facts | Uses memory efficiently; only stores relevant facts |
| Example | Medical diagnosis system deriving all possible symptoms and diseases | Expert system checking if a patient has a particular disease |
Inference in AI is the process of deriving new facts or conclusions from existing knowledge using logical reasoning or rules. It is a fundamental component of expert systems, rule-based systems and knowledge representation frameworks. Through inference, an AI system can answer queries, make decisions or deduce unknown information based on the knowledge it has stored.
Example: If the knowledge base contains:
Inference: The system can deduce that Tweety can fly.
In AI, an ontology is a formal representation of knowledge that defines a set of concepts, categories and relationships within a domain. It provides a structured vocabulary and a framework for describing entities, their properties and interconnections. Ontologies are essential for reasoning because they allow AI systems to infer new knowledge, detect inconsistencies and answer complex queries by understanding the relationships and constraints within the domain. Essentially, ontologies enable machines to “understand” the semantics of a domain rather than just processing raw data.
How Ontologies Help in Reasoning
Example: In a medical ontology:
Using reasoning, the system can deduce: If a patient has certain symptoms, it may infer possible diseases and recommend treatments.
Reasoning in AI is the process of drawing conclusions from knowledge. Different types of reasoning determine how conclusions are derived from known information. The main types are deductive, inductive and abductive reasoning, each with its own approach and use cases.
| Type of Reasoning | Definition | Example | Use in AI |
|---|---|---|---|
| Deductive | Derives conclusions that are logically certain from known facts or rules. | Facts: “All birds can fly. Tweety is a bird.” → Conclusion: “Tweety can fly.” | Rule-based systems, expert systems, logic programming |
| Inductive | Generalizes patterns or rules from specific observations; conclusions are probabilistic. | Observation: “Swan1 is white, Swan2 is white” → Conclusion: “All swans are white.” | Machine learning, pattern recognition, probabilistic reasoning |
| Abductive | Infers the most likely explanation for observed facts; used when information is incomplete. | Observation: “Grass is wet.” → Possible explanation: “It rained last night.” | Diagnosis systems, fault detection, hypothesis generation |
A Bayesian Network (BN) is a graphical model that represents probabilistic relationships among a set of variables using a directed acyclic graph (DAG). Each node in the graph corresponds to a variable and edges represent direct dependencies between variables. Bayesian networks allow AI systems to reason under uncertainty by encoding conditional probabilities and using them to compute the likelihood of different outcomes given observed evidence. They combine both graphical structure and probabilistic inference, making them useful for complex reasoning tasks.
Example:
The Dempster-Shafer Theory (DST), also called evidence theory, is a mathematical framework for reasoning under uncertainty. Unlike Bayesian probability which requires prior probabilities for all events, DST allows the representation of degrees of belief for subsets of possibilities, accommodating partial or incomplete information. It combines evidence from multiple sources using Dempster’s rule of combination to calculate the overall belief and plausibility of events.
Example:
| Feature | Monotonic Reasoning | Non-Monotonic Reasoning |
|---|---|---|
| Definition | Once a conclusion is drawn, it remains valid regardless of new information. | Conclusions can change or be retracted when new information is added. |
| Knowledge Update | Adding facts never invalidates previous conclusions. | Adding facts may invalidate previous conclusions. |
| Flexibility | Rigid, less adaptable to changing environments. | Flexible, suitable for dynamic or uncertain environments. |
| Example | Mathematical proofs: “2+2=4” remains true. | “Birds can fly” → Tweety is a penguin → inference “Tweety can fly” is retracted. |
| Use Case | Theorem proving, formal logic systems | Expert systems, commonsense reasoning, AI planning |
| Feature | Symbolic Search Methods | Heuristic Search Methods |
|---|---|---|
| Definition | Explores search space systematically using rules and logic. | Uses domain-specific knowledge (heuristics) to guide search efficiently. |
| Solution Guarantee | Guaranteed to find a solution if one exists. | May not guarantee an optimal solution; focuses on likely paths. |
| Efficiency | Can be slow and computationally expensive for large spaces. | Generally faster; prioritizes promising states. |
| Approach | Blind or uninformed; no guidance about which path is better. | Informed; uses evaluation functions to choose paths. |
| Examples | BFS, DFS, Uniform-Cost Search | A*, Greedy Best-First Search, Hill Climbing |
| Best Use Case | Small or well-defined search spaces | Large, complex or real-time search problems |
In real-world environments, AI agents often operate with incomplete, uncertain or noisy information. Reasoning under such conditions requires the agent to draw plausible conclusions, make predictions or take decisions despite the uncertainty. Agents use techniques from probabilistic reasoning, belief representation and non-monotonic logic to handle uncertainty. By quantifying uncertainty and updating beliefs based on new evidence, agents can act intelligently even when they do not have complete knowledge of the world.
Key Techniques:
1. Probabilistic Reasoning (Bayesian Networks):
2. Dempster-Shafer Theory:
3. Non-Monotonic Reasoning:
4. Fuzzy Logic:
5. Markov Decision Processes (MDPs):
A Markov Decision Process (MDP) is a mathematical framework used in AI to model sequential decision-making problems under uncertainty. It provides a formal way to represent an agent interacting with a stochastic environment where the outcomes of actions are not deterministic. MDPs are widely used in reinforcement learning, planning and control systems. The defining property of an MDP is the Markov property which states that the future state depends only on the current state and action, not on past states.
Components of an MDP
An MDP is formally defined as a tuple (S, A, P, R, ):
1. S (States):
2. A (Actions):
3. P (Transition Probabilities):
4. R (Reward Function):
5. (Discount Factor):
Example: Grid world navigation,
The Bellman equation provides a recursive decomposition of the value function in an MDP. It expresses the value of a state as the expected sum of immediate reward and the discounted value of successor states. This equation is fundamental in dynamic programming, reinforcement learning and optimal control, as it allows agents to compute optimal policies that maximize cumulative reward over time.
Bellman Equation for the Value Function: For a given policy , the value function is:
Bellman Optimality Equation: To find the optimal policy :
Role in Decision-Making
A Hidden Markov Model (HMM) is a statistical model used to represent systems that are assumed to be a Markov process with hidden (unobservable) states. In an HMM, the system transitions between a finite set of hidden states, each of which emits observable outputs probabilistically. HMMs are widely used in AI for sequence modeling, temporal pattern recognition and probabilistic reasoning in situations where the true state of the system is not directly observable.
Key Components
1. States (S): Hidden states of the system (e.g., weather: sunny, rainy).
2. Observations (O): Observable outputs corresponding to each state (e.g., umbrella usage).
3. Transition Probabilities (A): Probability of moving from one hidden state to another:
4. Emission Probabilities (B): Probability of observing a symbol given a state:
5. Initial State Probabilities (): Probability of starting in each state:
Applications
In AI and decision theory, utility is a quantitative measure of the desirability or preference of a particular outcome. It allows an agent to rank possible outcomes and make rational choices. Expected utility extends this concept to uncertain or probabilistic environments by combining the utility of each possible outcome with its probability. Rational agents choose actions that maximize expected utility, ensuring optimal decision-making even when the consequences of actions are uncertain.
Key Concepts
1. Utility (U):
2. Expected Utility (EU): Accounts for uncertainty in outcomes by weighting each outcome’s utility by its probability.
Formula:
Where:
3. Optimal Decision Rule:
The agent selects the action that maximizes expected utility:
A Partially Observable Markov Decision Process (POMDP) is an extension of the standard MDP that models decision-making under uncertainty when the agent cannot fully observe the environment’s state. In a POMDP, the agent maintains a belief state which is a probability distribution over possible actual states and chooses actions based on this belief. They are widely used in AI planning for robotics, autonomous navigation and intelligent agents where sensors provide noisy or incomplete information about the environment.
Components of a POMDP
A POMDP is defined as a tuple:
| Feature | Deterministic Environment | Stochastic Environment |
|---|---|---|
| Definition | Next state is fully predictable given current state and action | Next state is probabilistic; may vary even for the same action |
| Outcome of Actions | Single, definite outcome | Multiple possible outcomes with probabilities |
| Planning Complexity | Easier to plan and compute optimal paths | Requires probabilistic reasoning or expected utility calculations |
| Example | Chess (ignoring opponent randomness) | Robot navigation with slippery floors or sensor noise |
| Algorithm Suitability | Classical search methods (DFS, BFS, A*) | MDPs, POMDPs, reinforcement learning |
A heuristic function in Artificial Intelligence is an evaluation function that provides an estimate of the cost or distance from a given state to the goal. It does not guarantee exact values but helps the search algorithm decide which paths are more promising to explore. By prioritizing nodes with lower heuristic values, search algorithms can significantly reduce the search space and improve efficiency..
How Heuristics Guide Search:
Heuristic functions guide search by telling the algorithm which states are more promising to explore first. Instead of blindly expanding all possible states (as in uninformed search), heuristics help the agent focus on paths that seem closer to the goal. Different algorithms use heuristics in different ways:
1. Greedy Best-First Search
Formula:
2. A* Search
Formula:
3. Hill Climbing & Local Search
An Expert System is an AI-based software application designed to simulate human expertise in a specific domain. It uses a knowledge base of facts and rules along with an inference engine to reason about data and provide solutions, explanations or recommendations. Expert systems were among the earliest successful applications of AI and are widely used in medical diagnosis, engineering and troubleshooting systems.
Main Components of an Expert System
1. Knowledge Base
2. Inference Engine
3. User Interface
4. Explanation Facility
5. Knowledge Acquisition Module
In an expert system, production rules are the basic units of knowledge representation. They follow an IF–THEN format where the IF part represents a condition and the THEN part specifies an action or conclusion. The inference engine continuously checks which rules are applicable based on the current facts in the knowledge base and then applies (or “fires”) them to derive new knowledge.
How They Work
General Rule Structure
Example
Expert systems are AI programs that simulate human expertise within a specific domain by using a knowledge base and inference engine. They have been widely used in fields such as medical diagnosis, engineering troubleshooting and financial advising. While they offer many benefits, they also come with limitations that affect their applicability in real-world scenarios.
Advantages
Disadvantages
1. Knowledge acquisition: It refers to the process of extracting, structuring and formalizing expert knowledge so it can be stored in the knowledge base of an expert system. This usually involves collaboration with human experts, analysis of domain-specific problems and encoding rules in a machine-usable format.
2. Knowledge Engineering: Knowledge engineering is the broader discipline of designing, building and maintaining expert systems. It involves not only knowledge acquisition but also organizing, updating, testing and validating the knowledge base. Knowledge engineers act as intermediaries between domain experts and the system, ensuring the expert system can reason effectively.
Key Tasks of Knowledge Engineers:
A rule-based system is an Artificial Intelligence (AI) system that stores knowledge in the form of rules (IF–THEN statements) and uses these rules to make inferences or decisions. It is one of the earliest and most widely used methods for representing and reasoning with knowledge in AI. By systematically applying rules to known facts, the system can derive new knowledge, solve problems and support decision-making in domains like medical diagnosis, expert advisory systems and troubleshooting.
How It Infers New Knowledge:
1. Knowledge Base: Contains facts (data about the world) and rules (domain knowledge).
2. Inference Engine: The reasoning mechanism that applies rules to facts.
3. Rule Firing: When the conditions (IF part) of a rule are satisfied, the system executes the action/conclusion (THEN part), adding new knowledge to the knowledge base.
Example
Fuzzy Logic is a form of logic that deals with reasoning under uncertainty, vagueness and partial truth. Unlike classical Boolean logic which assigns values as strictly True (1) or False (0), fuzzy logic allows values to range continuously between 0 and 1, representing degrees of truth.
This makes it especially useful in modeling human-like reasoning where concepts are not always black-and-white (e.g., "the weather is warm" or "the glass is half full").
Mathematical Representation
A fuzzy set A in universe X is defined as:
where:
Example: If , it means 28°C is "70% hot".
Fuzzy Logic is an extension of classical Boolean logic that allows reasoning with degrees of truth rather than strict true/false values. While Boolean logic works only with binary states (0 or 1), fuzzy logic introduces a continuum of values between 0 and 1, making it more suitable for real-world scenarios where uncertainty, vagueness and imprecision exist (e.g., “warm,” “tall,” “high speed”).
| Aspect | Classical Boolean Logic | Fuzzy Logic |
|---|---|---|
| Truth Values | Strictly binary: either 0 (False) or 1 (True) | Continuous range between 0 and 1 (e.g., 0.2, 0.7) |
| Nature of Reasoning | Crisp, exact, deterministic | Approximate, handles uncertainty and vagueness |
| Example Statement | “The room is hot” → either True (1) or False (0) | “The room is 0.7 hot” → partial truth |
| Mathematical Basis | Set theory (clear membership: in or out of a set) | Fuzzy set theory (partial membership with degree of belonging) |
| Applications | Digital circuits, binary decision-making, database queries | Control systems, washing machines, medical diagnosis, robotics, natural language processing |
| Flexibility | Rigid, cannot handle imprecision | Flexible, models human-like reasoning |
Fuzzy logic is widely used in real-world AI systems and control applications where human-like reasoning is needed to handle uncertainty, vagueness or partial truths. By assigning degrees of truth rather than binary values, fuzzy logic allows systems to make smooth, adaptive and intelligent decisions in environments that are too complex or imprecise for classical Boolean logic.
Real-Life Applications
1. Washing Machines: Uses fuzzy logic to adjust water level, washing time and detergent usage based on factors such as:
Example: A medium load with slightly dirty clothes → medium water + moderate wash time.
2. Air Conditioners / Climate Control: Adjusts temperature and fan speed based on:
Allows smooth transitions rather than ON/OFF extremes.
3. Automobile Systems:
4. Cameras
5. Industrial Process Control
6. Robotics
Deterministic reasoning assumes that the environment and the outcomes of actions are fully predictable. Every action taken in a given state leads to a known and definite result, so reasoning can be done with certainty.
In contrast, reasoning under uncertainty deals with situations where the agent does not have complete knowledge of the environment or where outcomes are probabilistic. Agents must make decisions using probabilities, beliefs or approximate reasoning to handle incomplete, noisy or ambiguous information.
| Feature | Deterministic Reasoning | Reasoning Under Uncertainty |
|---|---|---|
| Outcome Predictability | Fully predictable; one action → one known result | Probabilistic; one action → multiple possible results with certain probabilities |
| Knowledge Requirement | Complete knowledge of environment and rules | Partial or uncertain knowledge; may rely on observations or beliefs |
| Decision Making | Straightforward; logical deduction suffices | Requires probabilistic reasoning, expected utility or fuzzy logic |
| Algorithms Used | Classical search algorithms: DFS, BFS, A*, uniform-cost search | Bayesian networks, Markov Decision Processes (MDPs), POMDPs, fuzzy reasoning |
| Example | Chess without randomness (deterministic moves) | Robot navigation with sensor noise or slippery surfaces |
| Error Handling | Errors only from incorrect logic or rules | Errors arise from uncertainty in observations or stochastic effects |
Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties. The agent’s goal is to learn a policy that maximizes cumulative reward over time. Unlike supervised learning, RL does not rely on labeled data; instead, the agent explores and learns from trial-and-error interactions.
Key Components of Reinforcement Learning
In Reinforcement Learning (RL), reward maximization is the process by which an agent learns to choose actions that maximize the cumulative reward over time. Instead of focusing solely on immediate gains, the agent considers the long-term consequences of its actions and adapts its behavior to achieve the highest overall reward.
1. Immediate Reward () – The feedback received from the environment after performing an action at time t.
2. Cumulative Reward / Return () – The total expected reward from time t onward:
Where is the discount factor which balances immediate vs. future rewards.
3. Value Function(): Measures the expected cumulative reward if the agent starts in state and follows policy :
4. Optimal Policy () – The strategy that maximizes expected cumulative reward for all states:
How It Works:
Q-Learning is a model-free reinforcement learning algorithm used to learn the optimal action-selection policy for an agent interacting with an environment. It does not require prior knowledge of the environment’s dynamics (transition probabilities). Instead, the agent learns from trial-and-error experiences by updating a Q-value table which represents the expected cumulative reward for taking an action in a given state.
Q-Learning Update Rule
The Q-values are updated iteratively using the Bellman equation:
Where:
How Q-Learning Works
1. Initialize Q-table with arbitrary values (often zeros).
2. For each step:
3. Repeat until Q-values converge, resulting in the optimal policy.
Both Q-Learning and SARSA are model-free reinforcement learning algorithms used to learn the optimal action-selection policy for an agent interacting with an environment.
Both algorithms aim to maximize cumulative reward, but their learning behavior differs depending on whether they consider the optimal future action or the actual exploratory action.
| Feature | Q-Learning | SARSA |
|---|---|---|
| Policy Type | Off-policy: Learns optimal policy independent of actions taken | On-policy: Learns policy based on actions actually taken |
| Q-Value Update Rule | ||
| Future Action Consideration | Considers best possible action in the next state | Considers actual action taken in the next state |
| Exploration Handling | Ignores exploratory moves; assumes optimal action | Updates Q-values based on exploratory actions |
| Convergence | Often faster in deterministic environments | Safer in stochastic or risky environments; may converge slower |
| Example Scenario | Grid-world with predictable rewards | Grid-world with uncertain or risky rewards |
In Reinforcement Learning (RL), an agent must choose actions to maximize cumulative reward over time. The exploration vs exploitation trade-off is a fundamental challenge:
Balancing these two is crucial: too much exploration can waste time on suboptimal actions while too much exploitation can prevent the agent from finding the globally optimal policy.
| Aspect | Exploration | Exploitation |
|---|---|---|
| Goal | Discover new strategies or states | Use known strategies to maximize immediate reward |
| Action Choice | Random or less-known actions | Actions with highest expected Q-value |
| Risk | May lead to suboptimal or negative rewards | May miss better long-term rewards |
| Learning Effect | Helps the agent learn more about the environment | Solidifies knowledge about known good actions |
| Example | Trying a new path in a maze | Following a path that previously gave high rewards |
In Reinforcement Learning (RL), agents can learn to make decisions using two main approaches: model-based and model-free.
| Feature | Model-Based RL | Model-Free RL |
|---|---|---|
| Environment Knowledge | Requires or learns a model of the environment (transition probabilities & rewards) | Does not require a model; learns from experience |
| Planning vs Learning | Can plan ahead using the model | Learns only from trial-and-error |
| Sample Efficiency | More sample-efficient (fewer interactions needed) | Less sample-efficient; needs more interactions |
| Computation | Often computationally intensive due to planning | Computationally simpler per step |
| Example Algorithms | Value Iteration, Policy Iteration, Dyna-Q | Q-Learning, SARSA, Monte Carlo methods |
| Adaptability | Can adapt quickly if the model is accurate | Slower adaptation; requires repeated exploration |
| Key Idea | “I know or learn the rules, so I can plan the best actions.” | “I don’t know the rules; I learn what works by trial-and-error.” |
A stochastic environment is one where the outcomes of an agent’s actions are probabilistic rather than deterministic. That is, taking the same action in the same state may lead to different next states or rewards. In such environments, an RL agent cannot rely on fixed outcomes and must learn policies that maximize expected cumulative reward rather than immediate reward.
How RL Agents Handle Stochasticity
1. Use of Probabilistic Value Functions
2. Discount Factor (): Balances immediate vs. future rewards, helping smooth out variability in stochastic outcomes.
3. Exploration Strategies: Policies like ε-greedy, softmax or Upper Confidence Bound (UCB) allow the agent to explore uncertain or probabilistic outcomes and improve learning.
4. Expected Reward Maximization: Instead of choosing actions that are best in one trial, the agent selects actions that maximize expected cumulative reward across all probabilistic outcomes.
5. Use of Model-Based or Model-Free Methods
Example: Grid world with slippery tiles:
In Reinforcement Learning (RL), an agent interacts with an environment to maximize cumulative rewards. Three core concepts govern how the agent makes decisions and evaluates actions: policy, value function and reward function.
1. Policy () – The policy represents the agent’s strategy for choosing actions in different states. It tells the agent what to do in each situation. Policies can be:
2. Value Function () – The value function estimates how good a state or state-action pair is in terms of expected cumulative reward. It helps the agent evaluate long-term benefits of actions and make better decisions.
3. Reward Function () – The reward function provides immediate feedback from the environment after the agent takes an action in a state. It measures short-term success and drives the learning process.
| Aspect | Policy | Value Function | Reward Function |
|---|---|---|---|
| Purpose | Strategy for selecting actions in each state | Estimates long-term expected returns | Provides immediate numerical feedback |
| Input | State information | State or (state, action) pair | State, action or state-action transition |
| Output | Action or distribution over actions | Expected value of future cumulative rewards | Instant reward signal |
| Role in Learning | Guides agent’s decision-making process | Assesses desirability of states/actions | Directs agent toward goals |
| Dependency | May depend on value/reward functions | Depends on policy and reward function | Independent, foundational signal |
| Optimization Goal | Learn optimal action-selection | Accurately predict future rewards | Shape agent behavior via rewards |
The Expectation-Maximization (EM) algorithm is a classical, iterative optimization technique in artificial intelligence and statistics, used to estimate the parameters of probabilistic models—especially when the data involves hidden or latent variables. The algorithm works by alternating between two main steps:
Key Concepts
Monte Carlo methods are statistical techniques that rely on repeated random sampling to solve complex problems which may be deterministic or probabilistic in nature. They are widely used in artificial intelligence (AI) for their ability to model uncertainty, simulate systems and approximate solutions where traditional analytical calculations are impractical.
Monte Carlo methods involve three core steps:
Applications in AI
Forward state-space search in AI is a search strategy that starts from an initial state and explores the possible successor states by applying valid actions until a goal state is reached. It progressively moves forward state by state toward achieving the desired goal by methodically generating and evaluating new states.
How it Works:
Advantages:
Local search optimization techniques are simple, practical methods used to find good solutions to complex problems by improving an initial solution step-by-step. They work by exploring the "neighbors" of a current solution—slightly changed versions—and moving to better ones until no improvement is found.
Common types include:
Applications:
Simulated annealing is an optimization algorithm inspired by the annealing process in metallurgy, designed to find an optimal or near-optimal solution in large and complex search spaces.
Key formula for acceptance probability of worse solutions:
where is the increase in the objective function and is the current temperature.
Advantages:
Iterative Deepening Search (IDS), also known as Iterative Deepening Depth-First Search (IDDFS), is a search algorithm used in artificial intelligence that combines the benefits of Depth-First Search (DFS) and Breadth-First Search (BFS). It is especially useful when the depth of the solution is unknown. IDS performs a series of depth-limited DFS searches, increasing the depth limit by one at each iteration until the goal is found or the entire search space is exhausted.
How IDS Works:
Example:
In a tree with branching factor 2 and depth 3:
A Truth Maintenance System (TMS) is an AI component that manages and maintains the consistency of beliefs and knowledge in a reasoning system. It tracks dependencies between facts, assumptions and conclusions, allowing the system to revise or retract beliefs when new information contradicts existing ones. Essentially, TMS helps maintain logical consistency in dynamic knowledge bases by recording justifications for each belief and updating conclusions as the context changes.
Commonsense reasoning refers to the human-like ability of an AI system to make presumptions about the everyday world, fill in gaps in knowledge and infer implicit facts that are obvious to humans based on general world knowledge.
Challenges of commonsense reasoning:
Let's see the differences between forward and backward planning,
| Aspect | Forward Planning | Backward Planning |
|---|---|---|
| Direction | Starts from initial state, moves forward | Starts from goal state, moves backward |
| Approach | Data-driven | Goal-driven |
| Search Process | From known conditions to explore paths | From goal condition to find necessary steps |
| Use Case | When initial state is well known | When goal or target state is clearly defined |
| Efficiency | May explore many unnecessary states | More focused on relevant states near goal |
| Memory & Computation | Can be less efficient if many paths explored | Usually more directed, potentially more efficient |
| Advantage | Intuitive, straightforward | Useful when working backward from specific targets |
| Example | Robot starts at known position, finds path forward | Planning steps backward from desired endpoint |
Let's see the differences between on-policy and off-policy learning,
| Feature | On-Policy Learning | Off-Policy Learning |
|---|---|---|
| Definition | Learns value of the policy currently being followed by the agent | Learns value of a policy different from the one used to generate data |
| Policy Used for Learning | Same as the policy used to select actions (behavior policy = target policy) | Different from the policy used to select actions (behavior policy ≠ target policy) |
| Example Algorithms | SARSA | Q-Learning |
| How It Learns | Updates policy based on actions actually taken | Updates policy using best possible future actions, not necessarily the ones taken |
| Data Used | Data collected by current policy’s actions | Can use data from any policy, past experiences or other agents |
| Exploration | Must explore using the current policy | Can learn from exploratory or fixed datasets |
| Stability | Usually more stable and consistent | More flexible but can have higher variance |
| Efficiency | Can be less sample efficient due to on-policy exploration | Often more sample efficient due to learning from optimal or off-policy experiences |
| Convergence | Converges under certain conditions, may be slower | Can converge faster but more complex to ensure stable learning |
| Use Case | When learning and acting policies must be aligned | When learning from other agents or offline data |
| Intuition | Learning by doing | Learning by observing others or from past data |
Let's see the differences between global search and local search algorithms,
| Aspect | Global Search Algorithms | Local Search Algorithms |
|---|---|---|
| Search Scope | Explores the entire search space systematically | Explores the neighborhood of the current solution |
| Goal | Find the global optimum (best overall solution) | Find a good or near-optimal solution quickly |
| Approach | Broad, exhaustive or systematic | Incremental improvement based on local moves |
| Memory Usage | High, needs to store many states | Low, stores only current state and neighbors |
| Speed | Usually slower and computationally expensive | Generally faster and more efficient |
| Risk of Local Optima | Low, since global search covers full space | High, can get stuck in local optima |
| Examples | Breadth-First Search, A* Search | Hill Climbing, Simulated Annealing, Tabu Search |
| Application | Suitable when completeness and optimality are critical | Useful when solution space is huge or infinite |
Let's see the difference between gradient-based optimization and heuristic-based search,
| Aspect | Gradient-Based Optimization | Heuristic-Based Search |
|---|---|---|
| Basis | Uses derivatives (gradients) to guide search | Uses rules of thumb or domain knowledge |
| Requirement | Requires differentiable objective function | Works with non-differentiable, complex spaces |
| Search Direction | Moves toward steepest ascent/descent | Moves toward promising candidates using heuristic |
| Efficiency | Fast convergence on smooth, convex problems | Efficient in problems with complex landscapes |
| Risk of Local Optima | Can get stuck in local minima if the problem is multi-modal | Can escape local optima using probabilistic or memory techniques |
| Examples | Gradient Descent, Newton’s Method | A* Search, Hill Climbing, Genetic Algorithms |
| Applicability | Optimization problems with gradient information | Combinatorial optimization and heuristic search spaces |
Backtracking is a classic technique to solve constraint satisfaction problems like Sudoku. The approach is:
Alpha-beta pruning is an optimization of the minimax algorithm used in game-playing AIs like Chess to reduce the number of nodes evaluated in the game tree without affecting the final decision.
Process:
Benefits:
A robot can navigate a maze using reinforcement learning (RL) by treating the maze as an environment where it learns an optimal policy to reach the goal through trial and error. Here’s how this works:
Key Components:
How Navigation Works:
Advantages:
Example:
Minimax is a recursive algorithm used in decision-making and game theory to make optimal moves. In Tic-Tac-Toe, it works by simulating all possible future moves and outcomes of the game. The AI (say player X) always tries to maximize its score by choosing moves that lead it closer to winning while assuming that the opponent (player O) will also play optimally and try to minimize the AI’s chances. This back-and-forth reasoning ensures that the AI always picks the best possible move, either to win or at least force a draw.
How Minimax Works in Tic-Tac-Toe
1. Evaluate terminal states:
2. Recursive exploration:
3. Backtracking: