Unit II: String Matching Algorithms and Computational Geometry - CSE408 Design and Analysis of Algorithms | B.Tech CSE Notes PDF | FineNotes4U


Unit II: String Matching Algorithms and Computational Geometry


Contents
  1. ⭐Sequential Search and Brute-Force String Matching
    1. 1. Sequential Search
    2. 2. Brute-Force String Matching
    3. 3. Comparison Between Sequential Search & Brute-Force String Matching
  2. ⭐Closest-Pair and Convex-Hull Problem
    1. 1. Closest-Pair Problem
    2. 2. Convex Hull Problem
  3. ⭐Exhaustive Search
    1. 1. Introduction to Exhaustive Search
    2. 2. Uses of Exhaustive Search
    3. 3. Types of Exhaustive Search
    4. 4. Operations in Exhaustive Search
    5. 5. Example Problems Using Exhaustive Search
    6. 6. Limitations of Exhaustive Search
    7. 7. Optimizing Exhaustive Search
  4. ⭐Voronoi Diagrams
    1. 1. Introduction to Voronoi Diagrams
    2. 2. Uses of Voronoi Diagrams
    3. 3. Types of Voronoi Diagrams
    4. 4. Construction of Voronoi Diagrams
    5. 5. Operations in Voronoi Diagrams
    6. 6. Limitations of Voronoi Diagrams
  5. ⭐Naïve String-Matching Algorithm
    1. 1. Introduction to Naïve String-Matching Algorithm
    2. 2. Uses of Naïve String-Matching Algorithm
    3. 3. Working Principle of Naïve String-Matching Algorithm
    4. 4. Time Complexity of Naïve String-Matching Algorithm
    5. 5. Optimizations to Naïve String Matching
    6. 6. Advantages & Disadvantages of Naïve String-Matching Algorithm
  6. ⭐Rabin-Karp Algorithm
    1. 1. Introduction to Rabin-Karp Algorithm
    2. 2. Uses of Rabin-Karp Algorithm
    3. 3. Working Principle of Rabin-Karp Algorithm
    4. 4. Hash Function in Rabin-Karp Algorithm
    5. 5. Time Complexity of Rabin-Karp Algorithm
    6. 6. Optimizations to Rabin-Karp Algorithm
    7. 7. Advantages & Disadvantages of Rabin-Karp Algorithm
  7. ⭐Knuth-Morris-Pratt (KMP) Algorithm
    1. 1. Introduction to Knuth-Morris-Pratt (KMP) Algorithm
    2. 2. Uses of Knuth-Morris-Pratt (KMP) Algorithm
    3. 3. Working Principle of KMP Algorithm
    4. 4. LPS (Longest Prefix Suffix) Table
    5. 5. Searching for the Pattern in the Text using KMP
    6. 6. Time Complexity of KMP Algorithm
    7. 7. Optimizations to KMP Algorithm
    8. 8. Advantages & Disadvantages of KMP Algorithm
  8. ⭐Naïve Algorithm vs. Rabin-Karp Algorithm vs. KMP Algorithm
    1. Key Takeaways:
  9. 🚨Thanks for visiting finenotes4u✨

⭐Sequential Search and Brute-Force String Matching


1.1 Introduction to Sequential Search

  • Definition:
    • Sequential search is a simple searching algorithm where each element in a list or array is checked one by one until the target value is found or the end of the list is reached.
  • Concept:
    • Start from the first element and compare it with the target value.
    • If a match is found, return the position.
    • If no match is found, move to the next element.
    • Repeat until the target is found or the list ends.

1.2 Uses of Sequential Search

  1. Unsorted Data Search:
    • Works well when the list is not sorted, unlike Binary Search which requires sorted data.
  2. Small Data Sets:
    • Suitable when the number of elements is small (as performance is O(n)).
  3. Finding Duplicates:
    • Used when searching for multiple occurrences of a value in a list.
  4. Linked Lists & Memory-based Data Structures:
    • When elements aren't stored in contiguous memory (as in linked lists).
  5. Pattern Matching:
    • Used in text processing to find words or patterns in documents or files.

1.3 Types of Sequential Search

1.3.1 Linear Search (Iterative Approach)
  • The simplest method where each element is checked one by one.
  • Best case: O(1) (when the target is the first element).
  • Worst case: O(n) (when the target is the last element or not present).
Logic of Linear Search
  1. Start from index 0.
  2. Compare each element with the target value.
  3. If a match is found, return the index.
  4. If the list ends, return -1 (not found).
Example (Find ‘5’ in {1, 3, 5, 7, 9})
  • Compare 1 → No match.
  • Compare 3 → No match.
  • Compare 5Match found at index 2
1.3.2 Linear Search (Recursive Approach)
  • The same logic as iterative search, but implemented recursively.
  • The function calls itself until the element is found or the list ends.
Logic of Recursive Linear Search
  1. If the list is empty, return -1 (not found).
  2. Check the first element of the list.
  3. If it matches the target, return the index.
  4. Otherwise, search in the remaining list.
1.3.3 Sentinel Search (Optimized Sequential Search)
  • A small optimization of sequential search by placing the target at the end as a sentinel to eliminate boundary checks.
Logic of Sentinel Search
  1. Store the last element and replace it with the target (sentinel).
  2. Start searching from the first element.
  3. If a match is found, return the index.
  4. If the sentinel is reached, restore the original last element and return -1.
Example (Find ‘5’ in {1, 3, 5, 7, 9} using Sentinel)
  • Set sentinel at end: {1, 3, 5, 7, 5}
  • Start searching → Match found before reaching sentinel

2. Brute-Force String Matching

2.1 Introduction to Brute-Force String Matching

  • Definition:
    • A basic string searching algorithm that checks for a match character by character within a larger text.
  • Concept:
    • Given a text T of length n and a pattern P of length m,
    • Slide P across T one character at a time and check for a complete match.

2.2 Uses of Brute-Force String Matching

  1. Basic Text Search:
    • Searching for keywords in documents, log files, and emails.
  2. Pattern Detection in DNA Sequences:
    • Used in bioinformatics to find genetic patterns.
  3. Spam Filtering:
    • Detecting blacklisted words in emails.
  4. Simple Applications where Performance is Not Critical:
    • Works well in small datasets where efficiency is not a concern.

2.3 Steps of Brute-Force String Matching

  1. Start at index 0 in the text.
  2. Compare the first character of the pattern with the text.
  3. If all characters match, return the index.
  4. If a mismatch occurs, shift the pattern one position right.
  5. Repeat until the end of the text is reached.

2.4 Example of Brute-Force String Matching

Example (Find "abc" in "ababcabc")
  1. Compare "abc" with "aba" → ❌ Mismatch
  2. Compare "abc" with "bab" → ❌ Mismatch
  3. Compare "abc" with "abc" → ✅ Match found at position 2

2.5 Time Complexity of Brute-Force String Matching

  • Best Case: O(n) (if the pattern is found at the beginning).
  • Worst Case: O(m × n) (if every character has to be checked).

2.6 Optimizations of Brute-Force String Matching

2.6.1 Early Termination Optimization
  • If a mismatch occurs early, skip unnecessary comparisons.
  • Example:
    • Searching "xyz" in "abcdef" → Stop at first character mismatch.
2.6.2 Use of Preprocessing
  • Precompute pattern character frequency to decide where to shift next.
2.6.3 Using Hashing (Similar to Rabin-Karp Algorithm)
  • Instead of comparing characters, use a hash function to quickly compare patterns.

3. Comparison Between Sequential Search & Brute-Force String Matching

FeatureSequential SearchBrute-Force String Matching
DefinitionSearch an element in a listSearch a substring in a text
Best Case ComplexityO(1) (element at start)O(n) (match at start)
Worst Case ComplexityO(n) (element not found)O(m × n) (every comparison needed)
Data TypeWorks with numbers/listsWorks with text/strings
Use CaseSearching in arrays/listsSearching in documents, databases
EfficiencyWorks well for small listsInefficient for large texts

⭐Closest-Pair and Convex-Hull Problem


1. Closest-Pair Problem

1.1 Introduction to Closest-Pair Problem

  • Definition:

    • The Closest-Pair Problem finds the two closest points in a given set of points on a 2D plane.
    • The distance between two points (x1,y1)(x_1, y_1) and (x2,y2)(x_2, y_2) is calculated using the Euclidean Distance Formula: d=(x2x1)2+(y2y1)2
  • Concept:

    • Given n points in a 2D plane, find two points that have the smallest distance between them.

1.2 Uses of Closest-Pair Problem

  1. Geographical Mapping
    • Used in GPS navigation and nearest location detection (e.g., "Find the nearest ATM").
  2. Computer Vision
    • Used in image recognition for detecting closest object clusters.
  3. Clustering in Machine Learning
    • Helps in grouping similar data points based on proximity.
  4. Networking & Communication Systems
    • Used in wireless networks to find the nearest access point for connection.
  5. Astronomy & Physics Simulations
    • Used in space studies to find the closest celestial bodies.

1.3 Approaches to Solve the Closest-Pair Problem

1.3.1 Brute-Force Method (O(n²))
  • Definition:
    • The simplest method that compares every pair of points to find the closest one.
  • Steps:
    1. Compare each point with every other point.
    2. Calculate the Euclidean distance for each pair.
    3. Keep track of the minimum distance found.
Example Explanation (Find closest pair among A(1, 1), B(4, 2), C(3, 5))
  1. Compute distances:
    • d(A, B) = 3.16
    • d(A, C) = 4.47
    • d(B, C) = 3.16
  2. Closest pair = (A, B) or (B, C) with distance 3.16
  • Time Complexity: O(n²)
    • Since every point is compared with all other points, the number of comparisons = n(n1)2\frac{n(n-1)}{2}.
1.3.2 Divide and Conquer Method (O(n log n))
  • Definition:
    • A more efficient approach that divides the set of points into two halves, finds the closest pair in each half, and merges the results.
  • Steps:
    1. Sort the points based on x-coordinates.
    2. Divide the points into two halves.
    3. Find the closest pair in both halves recursively.
    4. Merge step: Find the closest pair across the two halves.
    5. Return the minimum distance found.
Example Explanation (Find closest pair among A(1, 1), B(4, 2), C(3, 5), D(7, 8))
  1. Sort points by x-coordinates{A(1,1), B(4,2), C(3,5), D(7,8)}
  2. Divide into two halves: {A, B} and {C, D}
  3. Find closest pairs in both halves separately.
  4. Merge step: Check boundary points for a closer pair.
  • Time Complexity: O(n log n)
    • Sorting takes O(n log n), and merging takes O(n).

2. Convex Hull Problem

2.1 Introduction to Convex Hull

  • Definition:
    • A Convex Hull is the smallest convex polygon that can enclose a given set of points in a plane.
    • It is like stretching a rubber band around the given points, and the points touching the band form the Convex Hull.

2.2 Uses of Convex Hull

  1. Geographic Information Systems (GIS)
    • Used to find boundaries of a city or country on a map.
  2. Computer Graphics & Image Processing
    • Used in object detection and shape recognition.
  3. Collision Detection in Games
    • Used in game development for detecting boundaries of objects.
  4. Machine Learning & Data Clustering
    • Used in defining data boundaries in unsupervised learning.
  5. Robotics & Navigation
    • Helps robots avoid obstacles by mapping their surroundings.

2.3 Approaches to Solve the Convex Hull Problem

2.3.1 Brute-Force Approach (O(n³))
  • Definition:

    • A basic method where every pair of points is checked to see if it forms part of the hull.
  • Steps:

    1. Consider all pairs of points and check if they form a hull edge.
    2. A point is on the hull if all other points lie on one side of the line formed by the pair.
    3. Collect all such boundary edges to form the convex hull.
  • Time Complexity: O(n³)

    • Since each pair of points is checked with all other points.
2.3.2 Graham’s Scan Algorithm (O(n log n))
  • Definition:
    • A more efficient method that sorts points based on angles and constructs the hull using a stack.
  • Steps:
    1. Find the lowest point (P0) (smallest y-coordinate).
    2. Sort remaining points based on the angle with P0.
    3. Use a stack to construct the hull by checking left-turns and right-turns.
Example Explanation (Find Convex Hull for A(1,1), B(4,2), C(3,5), D(7,8))
  1. Select lowest point → P0 = A(1,1)
  2. Sort remaining points by angle{A, B, C, D}
  3. Use Stack:
    • Add A to stack.
    • Add B to stack.
    • Add C to stack.
    • If right-turn detected, remove last point from stack.
  • Time Complexity: O(n log n)
    • Sorting takes O(n log n), and scanning takes O(n).
2.3.3 Jarvis March Algorithm (O(nh))
  • Definition:
    • A method that selects points one by one by finding the leftmost extreme point.
  • Steps:
    1. Start with leftmost point.
    2. Select the next extreme point that forms the smallest left turn.
    3. Repeat until the starting point is reached.

Example Explanation (Find Convex Hull for A(1,1), B(4,2), C(3,5), D(7,8))

  1. Start with leftmost point (A).
  2. Find next extreme point (B).
  3. Find next extreme point (D).
  4. Return to starting point to complete hull.
  • Time Complexity: O(nh)
    • h = number of hull points, making it efficient for small hulls.


  • Definition:

    • Exhaustive search (also called brute-force search) is a problem-solving approach where all possible solutions are systematically explored to find the best or correct one.
    • It is guaranteed to find the optimal solution but may be computationally expensive for large inputs.
  • Concept:

    • Given a problem, try every possible combination and select the best one based on the given constraints.
    • Used when no efficient algorithm exists or when ensuring optimality is necessary.
  1. Finding the Best Solution in Optimization Problems
    • Example: Traveling Salesman Problem (TSP) – Checking all routes to find the shortest path.
  2. Brute-Force Pattern Matching
    • Example: Checking all possible positions of a substring within a text.
  3. Breaking Cryptographic Algorithms
    • Example: Brute-force password cracking by trying every combination.
  4. Solving NP-Complete Problems
    • Example: Graph Coloring, Knapsack Problem, and other combinatorial problems.
  5. Path Finding in Games & AI
    • Example: Finding the best move in Chess by evaluating all possibilities.

3.1 Complete Exhaustive Search

  • Tries all possible solutions and guarantees an optimal answer.
  • Example: Checking all paths in a maze to find the shortest one.

3.2 Partial Exhaustive Search

  • Tries a subset of solutions when full exploration is impractical.
  • Example: AI in Chess, where only likely good moves are explored.

3.3 Optimized Exhaustive Search

  • Applies pruning techniques to eliminate unlikely solutions.
  • Example: Branch and Bound, which skips unnecessary calculations.

4.1 Generating All Possible Solutions

  • Definition:
    • Systematically list all potential candidates that could be a solution.

Logic to Generate All Solutions

  1. Identify the set of possible choices.
  2. Construct a recursive or iterative approach to list all possible combinations.
  3. Store each generated solution for evaluation.

Example (Generate all subsets of {1, 2, 3})

  • Possible subsets: { {}, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3} }

  • Time Complexity: O(2ⁿ) (as each element can be included or excluded).

4.2 Checking Feasibility of Each Solution

  • Definition:
    • After generating a solution, check whether it satisfies the problem’s constraints.

Logic to Check Feasibility

  1. Verify that the solution does not violate problem constraints.
  2. If invalid, discard the solution and proceed to the next.

Example (Knapsack Problem - Can a subset of weights sum to 10?)

  • Given {4, 6, 8, 3}, check:

    • {4, 6} → 10 ✅
    • {8, 3} → 11 ❌ (Discarded)
  • Time Complexity: O(2ⁿ) (Checking all subsets).

4.3 Selecting the Best Solution

  • Definition:
    • After checking feasibility, pick the optimal solution based on the given criteria.

Logic to Select the Best Solution

  1. Keep track of the best (or shortest, lowest cost, highest value) solution found so far.
  2. Compare each new solution with the current best solution.
  3. If better, update the best solution.

Example (Find the shortest route among all possible paths in a graph)

  • Paths: {A→B→C = 10, A→C→B = 8, A→B→D→C = 15}

  • Best solution: {A→C→B = 8}

  • Time Complexity: O(n!) (as all paths are checked).

5.1 Traveling Salesman Problem (TSP)

  • Problem Statement:
    • A salesman must visit n cities, traveling the shortest possible route while visiting each city exactly once and returning to the starting point.
  • Solution Using Exhaustive Search:
    1. Generate all possible city visit orders.
    2. Compute the total travel distance for each route.
    3. Choose the shortest route.
  • Time Complexity: O(n!) (since all permutations of cities are checked).

Example (TSP for 3 cities: A, B, C)

  • Possible routes: {A→B→C→A, A→C→B→A, B→A→C→B, ...}
  • Compute distances for all routes.
  • Select the shortest route ✅.

5.2 Knapsack Problem (Subset-Sum Problem)

  • Problem Statement:
    • Given n items with weights and values, find the subset that maximizes value without exceeding a given weight limit W.
  • Solution Using Exhaustive Search:
    1. Generate all possible subsets of items.
    2. Compute the total weight and value for each subset.
    3. Select the subset with the highest value that satisfies the weight limit.
  • Time Complexity: O(2ⁿ).

Example (Items: {A(3kg, $6), B(2kg, $5), C(4kg, $8)}, Weight Limit = 5kg)

  • Possible subsets: { {}, {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C} }
  • Compute total weight and value for each subset.
  • Select the best subset within 5kg ✅.

5.3 Graph Coloring Problem

  • Problem Statement:
    • Assign colors to graph vertices such that no two adjacent vertices share the same color, using the fewest colors possible.
  • Solution Using Exhaustive Search:
    1. Try all color assignments for the vertices.
    2. If a coloring violates adjacency constraints, discard it.
    3. Choose the solution with the minimum colors.
  • Time Complexity: O(kⁿ) (where k is the number of colors).

Example (Graph with vertices A, B, C, and edges A-B, B-C, C-A)

  • Possible colorings:
    • {A=Red, B=Green, C=Blue}
    • {A=Red, B=Red, C=Blue} ❌ (Invalid, A & B have the same color).
  • Select the valid coloring with the fewest colors ✅.
  1. Exponential Growth
    • In large datasets, checking all possibilities becomes impractical.
  2. Computationally Expensive
    • Problems like TSP and Knapsack take O(n!) or O(2ⁿ) time.
  3. Inefficiency in Real-World Applications
    • Other algorithms (e.g., Dynamic Programming, Greedy Algorithms) are faster and more efficient.
  • Backtracking: Eliminates unnecessary paths early (e.g., N-Queens Problem).
  • Branch and Bound: Cuts off suboptimal branches in optimization problems.
  • Heuristic Approaches: Uses approximation (e.g., Genetic Algorithms, Simulated Annealing).

⭐Voronoi Diagrams


1. Introduction to Voronoi Diagrams

  • Definition:
    • A Voronoi diagram is a geometric data structure that partitions a plane into regions based on proximity to a given set of points.
    • Each region contains all points that are closer to one specific seed point than any other.
  • Concept:
    • Given a set of n points (called sites or generators) in a 2D plane, the plane is divided into n convex polygons where:
      • Each polygon contains points closest to a specific site.
      • The boundaries between regions are perpendicular bisectors of the segments between seed points.

Example Explanation

  • Suppose we have three points:
    • A(1,3), B(4,5), C(6,2)
  • The Voronoi diagram consists of three regions, where:
    • Region A contains all points closest to A.
    • Region B contains all points closest to B.
    • Region C contains all points closest to C.
  • The boundaries between these regions are straight lines equidistant from two seed points.

2. Uses of Voronoi Diagrams

  1. Geographical Mapping & Nearest Neighbor Search
    • Used in city planning, cell tower placement, and nearest hospital detection.
  2. Computational Geometry & Artificial Intelligence
    • Used in path planning for robots and AI-based decision-making.
  3. Network Optimization & Wireless Communication
    • Helps in optimal placement of cell towers to maximize coverage.
  4. Medical Imaging & Biology
    • Used in analyzing tissue structures and cell distribution.
  5. Computer Graphics & Game Development
    • Used in procedural terrain generation and AI navigation systems.
  6. Astronomy & Physics Simulations
    • Used for modeling galaxy distributions and crystal structures.

3. Types of Voronoi Diagrams

3.1 Ordinary Voronoi Diagram (Basic Type)

  • Each point has one region where all points are closest to it.

3.2 Weighted Voronoi Diagram

  • Some points have higher influence (priority) and expand their regions accordingly.
  • Used in meteorology for rain and wind distribution modeling.

3.3 Additively Weighted Voronoi Diagram (Power Diagram)

  • The distance is modified by an additive weight, changing region sizes.

3.4 Multiplicatively Weighted Voronoi Diagram

  • Each site has a scaling factor, adjusting distance computations.
  • Used in economic models where regions represent market dominance.

3.5 K-Order Voronoi Diagram

  • Instead of finding the closest site, this finds the k nearest sites to each point.
  • Used in logistics and emergency service planning.

3.6 Farthest-Point Voronoi Diagram

  • Instead of dividing space by closest points, it partitions based on farthest points.
  • Used in evacuation route planning.

4. Construction of Voronoi Diagrams

4.1 Brute-Force Construction (O(n²))

  • Definition:
    • The simplest approach that checks the distance of every point to all given sites.
  • Steps:
    1. For each pixel or grid point in the plane, compute its distance to all given sites.
    2. Assign the point to the closest site.
    3. The boundaries where distances are equal form Voronoi edges.
  • Time Complexity: O(n²)
    • Since every point checks its distance to all other points.

4.2 Fortune’s Algorithm (O(n log n))

  • Definition:
    • A sweep-line algorithm that constructs Voronoi diagrams efficiently in O(n log n) time.
  • Steps:
    1. Sort all points by their y-coordinates.
    2. Use a parabolic sweep-line from top to bottom to maintain a dynamic beach line.
    3. When the sweep-line encounters a site, insert an arc into the beach line.
    4. When it encounters a Voronoi edge intersection, remove an arc.
    5. Store the boundary points to construct the final Voronoi diagram.
  • Time Complexity: O(n log n)
    • Sorting takes O(n log n), and maintaining the beach line takes O(n).

Example Explanation

  • Given points A(2,3), B(5,7), C(8,2):
    1. Start sweeping top to bottom.
    2. Maintain a parabolic beach line for active sites.
    3. Insert Voronoi boundaries at equal-distance intersections.
    4. Continue until the entire plane is partitioned.

4.3 Incremental Algorithm (O(n log n))

  • Definition:
    • Adds one site at a time and updates the Voronoi diagram incrementally.
  • Steps:
    1. Start with a single site.
    2. Insert new points one by one.
    3. Adjust Voronoi boundaries dynamically.
    4. Continue until all points are added.
  • Time Complexity: O(n log n)
    • Since each insertion requires local adjustments in O(log n) time.

5. Operations in Voronoi Diagrams

5.1 Finding Nearest Neighbor (Point Location Query)

  • Definition:
    • Given a query point, determine which Voronoi region it belongs to.
  • Logic:
    1. Compute the distance of the query point to each site.
    2. Find the minimum distance and assign the point to that region.
  • Time Complexity: O(log n) (using a spatial search structure).

Example Explanation (Find nearest point to Q(3,4))

  • Given sites A(1,1), B(5,5), C(7,3),
  • Compute distances:
    • d(Q, A) = 3.6
    • d(Q, B) = 2.2 ✅ (Nearest)
    • d(Q, C) = 4.1
  • Point Q belongs to Voronoi region of B ✅.

5.2 Voronoi Edge Detection

  • Definition:
    • Finds boundaries between Voronoi regions.
  • Logic:
    1. Compute perpendicular bisectors between each pair of sites.
    2. Find their intersection points.
    3. Store these points as Voronoi edges.
  • Time Complexity: O(n log n)

5.3 Merging Two Voronoi Diagrams

  • Definition:
    • Combines two separate Voronoi diagrams into one.
  • Logic:
    1. Compute Voronoi diagrams independently.
    2. Overlay them on the same plane.
    3. Merge edges and resolve intersections.
  • Time Complexity: O(n log n)

6. Limitations of Voronoi Diagrams

  1. Sensitive to Input Data
    • Small changes in site positions drastically alter Voronoi regions.
  2. Computationally Expensive for Large Data Sets
    • O(n log n) complexity still grows large for millions of sites.
  3. Not Suitable for Dynamic Environments
    • Requires recomputation if points are frequently added/removed.

⭐Naïve String-Matching Algorithm


1. Introduction to Naïve String-Matching Algorithm

  • Definition:
    • The Naïve String-Matching Algorithm is a basic pattern searching algorithm that checks for a match character by character within a given text.
    • It slides the pattern one character at a time over the text and compares each substring with the pattern.
  • Concept:
    • Given a text T of length n and a pattern P of length m, the algorithm compares P with every substring of T.
    • If a match is found, return the index of occurrence.

Example Explanation

  • Given:
    • Text (T) = "ABABABCABABABC"
    • Pattern (P) = "ABABC"
  • The algorithm starts comparing the pattern with each substring in the text:
    • Compare "ABABC" with "ABABA"Mismatch (shift right)
    • Compare "ABABC" with "BABAB"Mismatch (shift right)
    • Compare "ABABC" with "ABABC"Match found at index 2

2. Uses of Naïve String-Matching Algorithm

  1. Text Searching & Information Retrieval
    • Finding words in documents (e.g., searching for "error" in a log file).
  2. Pattern Recognition in DNA Sequences
    • Searching for genetic patterns in DNA sequences.
  3. Spam Detection & Text Filtering
    • Detecting blacklisted words in emails.
  4. String Searching in Databases
    • Locating specific records in large datasets.
  5. Search Operations in Text Editors
    • Find and replace functionalities in word processors.

3. Working Principle of Naïve String-Matching Algorithm

3.1 Steps of Naïve String Matching

  1. Start with index 0 in the text.
  2. Compare the first character of the pattern with the current character in the text.
  3. If a character mismatch occurs, shift the pattern one step to the right.
  4. If all characters match, return the index of occurrence.
  5. Repeat until the end of the text is reached.

3.2 Example of Naïve String-Matching Algorithm

Example (Find "AB" in "ABACABAB")

Step-by-step matching process:

  1. Compare "AB" with "AB" → ✅ Match found at index 0.
  2. Shift right and compare "AB" with "BA" → ❌ Mismatch.
  3. Shift right and compare "AB" with "AC" → ❌ Mismatch.
  4. Shift right and compare "AB" with "CA" → ❌ Mismatch.
  5. Shift right and compare "AB" with "AB" → ✅ Match found at index 5.

Final Output: Pattern occurs at indices {0, 5}

4. Time Complexity of Naïve String-Matching Algorithm

  • Best Case: O(n)
    • Occurs when the pattern matches in the first few comparisons or the first letter of the pattern rarely appears in the text.
  • Worst Case: O(m × n)
    • Occurs when every character has to be checked (e.g., "AAAAA" searching for "AAA").

5. Optimizations to Naïve String Matching

5.1 Early Termination Optimization

  • If a mismatch occurs early, skip unnecessary comparisons and shift right.
  • Example:
    • Searching "XYZ" in "ABCDEF" → Stop at first mismatch.

5.2 Using Frequency Analysis

  • If the last character of the pattern is rare in the text, shift multiple steps at once.
  • Example:
    • Searching "ZZZ" in "ABCDEFGH" → Shift immediately after first mismatch.

5.3 Use of Hashing (Similar to Rabin-Karp Algorithm)

  • Instead of comparing character by character, use a hash function to quickly compare patterns.

6. Advantages & Disadvantages of Naïve String-Matching Algorithm

Advantages

Simple & Easy to Implement
✔ Works well for small texts & patterns
✔ No extra memory needed (compared to KMP)

Disadvantages

Inefficient for Large Texts (O(m × n) worst case)
Repeated Comparisons increase time complexity
Not practical for long pattern searches


⭐Rabin-Karp Algorithm


1. Introduction to Rabin-Karp Algorithm

  • Definition:

    • The Rabin-Karp Algorithm is a string-searching algorithm that uses hashing to efficiently search for a pattern in a text.
    • Instead of comparing characters one by one like the Naïve approach, it computes a hash value for the pattern and substrings of the text.
    • If the hash values match, it performs a character-by-character check to confirm a valid match.
  • Concept:

    • Given a text T of length n and a pattern P of length m, the algorithm:
      1. Computes the hash of P (pattern).
      2. Computes the hash of the first m characters of T (substring).
      3. Slides the pattern over the text by one position at a time, updating the hash value dynamically.
      4. If the hash matches, performs a character-by-character check to confirm.

2. Uses of Rabin-Karp Algorithm

  1. String Searching in Large Texts
    • Used in search engines to match queries quickly.
  2. Detecting Plagiarism
    • Used to compare chunks of text across documents.
  3. Pattern Matching in DNA Sequences
    • Used in bioinformatics to locate gene sequences.
  4. Spam Filtering
    • Identifies spam messages based on word patterns.
  5. Searching Multiple Patterns in a Single Pass
    • Unlike Naïve and KMP algorithms, Rabin-Karp can search multiple patterns at once.

3. Working Principle of Rabin-Karp Algorithm

3.1 Steps of Rabin-Karp Algorithm

  1. Compute the hash of the pattern.
  2. Compute the hash of the first substring of the text.
  3. Slide the window over the text one character at a time, updating the hash dynamically.
  4. If the hash matches, perform a character-by-character check to confirm the match.
  5. Continue until the entire text is scanned.

3.2 Example of Rabin-Karp Algorithm

Example (Find "AB" in "ABACABAB")

  • Given:
    • Text (T) = "ABACABAB"
    • Pattern (P) = "AB"
  • Step-by-step process:
  1. Compute the hash of "AB"Hash(P) = H1
  2. Compute the hash of first two characters in T ("AB") → Hash(T[0:2]) = H2
  3. Compare H1 and H2:
    • If H1 ≠ H2, shift pattern right and recompute hash.
    • If H1 = H2, do a character-by-character check.
  4. Repeat until the end of the text.

Final Output: Pattern occurs at indices {0, 5}

4. Hash Function in Rabin-Karp Algorithm

  • Definition:
    • A hash function is used to convert a string into a numerical value for quick comparisons.
  • Rolling Hash Technique:
    • Instead of recalculating the entire hash from scratch for each shift, use a rolling hash function to update the hash in O(1) time.

Rolling Hash Formula (Modular Arithmetic)

Hnew=(d×(Holdfirst character×dm1)+new character)modq

Where:

  • d = base (e.g., 256 for ASCII characters).
  • m = length of the pattern.
  • q = a prime number to reduce hash collisions.

Example Explanation (Sliding Hash Update)

  • Assume P = "AB" and T = "ABACABAB".
  • Compute hash of "AB" (H1).
  • Compute hash of "BA" (H2) by removing the first character and adding the next one.
  • Compute hash of "AC" (H3) similarly.
  • Use modulo operation (% q) to keep values small and reduce collisions.

5. Time Complexity of Rabin-Karp Algorithm

  • Best Case: O(n)
    • Occurs when there are few hash collisions, requiring fewer character comparisons.
  • Worst Case: O(m × n)
    • Occurs when hash collisions happen frequently, leading to multiple character comparisons.

6. Optimizations to Rabin-Karp Algorithm

6.1 Using a Good Hash Function

  • A prime number (q) helps reduce hash collisions.
  • Using higher base values (d) minimizes repeated hash values.

6.2 Avoiding Hash Collisions

  • If multiple substrings have the same hash, do a full character comparison before confirming a match.
  • Increase q to a larger prime number to reduce duplicate hash values.

6.3 Searching for Multiple Patterns Simultaneously

  • Store hashes of multiple patterns and compare them in a single pass.
  • Useful in plagiarism detection and spam filtering.

7. Advantages & Disadvantages of Rabin-Karp Algorithm

Advantages

Efficient for multiple pattern searching
Faster than Naïve algorithm in best case (O(n))
Can be used in large-scale applications like web search engines

Disadvantages

Hash collisions can cause O(m × n) worst-case time complexity
Hash calculations require additional computation
Not as fast as KMP in single-pattern search


⭐Knuth-Morris-Pratt (KMP) Algorithm


1. Introduction to Knuth-Morris-Pratt (KMP) Algorithm

  • Definition:

    • The Knuth-Morris-Pratt (KMP) Algorithm is a string-searching algorithm that efficiently finds occurrences of a pattern (P) in a text (T) using preprocessing to skip unnecessary comparisons.
    • Unlike Naïve String Matching, KMP avoids redundant re-evaluations by utilizing a precomputed table (LPS - Longest Prefix Suffix Table) to determine shifts.
  • Concept:

    • Instead of moving back in the text upon a mismatch, KMP uses the LPS table to determine how much the pattern should be shifted forward.
    • This makes the search process more efficient, especially for long texts.

Example Explanation

  • Given:
    • Text (T) = "ABABABCABABABC"
    • Pattern (P) = "ABABC"
  • Steps in KMP:
    1. Precompute the LPS table for the pattern "ABABC".
    2. Begin searching the pattern in the text.
    3. If a mismatch occurs, use the LPS table to determine how much to shift instead of restarting from scratch.
    4. Continue the process until the pattern is found or the text ends.

2. Uses of Knuth-Morris-Pratt (KMP) Algorithm

  1. Fast String Searching in Large Texts
    • Used in search engines, file searching, and text editors.
  2. DNA Pattern Matching
    • Used in bioinformatics to match DNA sequences efficiently.
  3. Spam Filtering & Web Crawling
    • Used in detecting keywords in emails and parsing webpages.
  4. Intrusion Detection Systems
    • Detects malware signatures by checking for specific bit patterns.
  5. Network Security & Packet Filtering
    • Used to scan network traffic for specific attack signatures.

3. Working Principle of KMP Algorithm

3.1 Steps of KMP Algorithm

  1. Precompute the LPS (Longest Prefix Suffix) Table for the pattern.
  2. Start comparing the pattern with the text from left to right.
  3. If a mismatch occurs, use the LPS table to skip unnecessary comparisons.
  4. Continue until the entire text is searched.

4. LPS (Longest Prefix Suffix) Table

4.1 What is the LPS Table?

  • The LPS table stores the length of the longest proper prefix that is also a suffix for each prefix of the pattern.
  • It helps in determining the shift upon a mismatch.

4.2 Steps to Construct the LPS Table

  1. Initialize an array LPS of size m (length of the pattern).
  2. Set LPS[0] = 0 (since a single character cannot have a prefix-suffix match).
  3. Compare prefixes and suffixes for each substring of the pattern.
  4. If a match is found, increment the LPS value.
  5. If a mismatch occurs, use the previous LPS value to decide the shift.

4.3 Example of LPS Table Construction

Example (Pattern: "ABABC")

IndexPatternLPS ValueExplanation
0A0No proper prefix == suffix
1AB0No proper prefix == suffix
2ABA1"A" is prefix & suffix
3ABAB2"AB" is prefix & suffix
4ABABC0"C" breaks the match

Final LPS Table: [0, 0, 1, 2, 0]

5. Searching for the Pattern in the Text using KMP

5.1 Steps for Pattern Matching

  1. Initialize pointers: i for text, j for pattern.
  2. If T[i] == P[j], increment both i and j.
  3. If j == length of pattern (m), a match is found, shift using LPS.
  4. If T[i] != P[j] and j > 0, use LPS[j-1] to determine how much to shift.
  5. If T[i] != P[j] and j == 0, move i forward.

5.2 Example of Pattern Search Using KMP

Example (Find "ABABC" in "ABABABCABABABC")

  1. Compute LPS Table: [0, 0, 1, 2, 0].
  2. Begin Searching:
    • "ABABC" matches "ABABC" in the text at index 2.
    • Mismatch at "C", shift pattern using LPS value.
  3. Continue scanning → Another match found at index 9 ✅.

6. Time Complexity of KMP Algorithm

  • Preprocessing LPS Table: O(m)
  • Pattern Search: O(n)
  • Total Complexity: O(n + m)
    • Faster than Naïve Algorithm (O(m × n)) since it skips unnecessary comparisons.

7. Optimizations to KMP Algorithm

7.1 Using a Better LPS Construction

  • Avoid redundant calculations while computing LPS values.
  • Use a failure function to efficiently update LPS values.

7.2 Skipping Redundant Comparisons

  • Modify i and j indices based on LPS table to reduce unnecessary checks.

7.3 Parallelizing KMP

  • Divide the text into multiple segments and search in parallel for improved performance.

8. Advantages & Disadvantages of KMP Algorithm

Advantages

Faster than Naïve Algorithm (O(n + m) instead of O(m × n))
Efficient for long texts & repeating patterns
Does not require extra space like Rabin-Karp

Disadvantages

Preprocessing overhead (LPS table computation takes extra time)
Not ideal for multiple pattern searching (Rabin-Karp is better for that)
Performance similar to Naïve Algorithm in best case


⭐Naïve Algorithm vs. Rabin-Karp Algorithm vs. KMP Algorithm


FeatureNaïve AlgorithmRabin-Karp AlgorithmKnuth-Morris-Pratt (KMP) Algorithm
DefinitionChecks for a match character by character at every position in the text.Uses hashing to compare substrings efficiently.Uses preprocessing (LPS table) to skip unnecessary comparisons.
Best Case Time ComplexityO(n) (Pattern found at start)O(n) (No hash collisions)O(n) (Few mismatches)
Worst Case Time ComplexityO(m × n) (Every character checked)O(m × n) (Frequent hash collisions)O(n + m) (Preprocessing + Search)
Preprocessing Required?NoYes (Compute hash for pattern)Yes (Compute LPS table)
Space ComplexityO(1) (No extra memory needed)O(1) (Extra space for hash values)O(m) (For LPS table)
Uses Hashing?NoYesNo
Uses Precomputed Table?NoNoYes (LPS Table)
Works Well forSmall textsSearching multiple patterns at onceSearching long repeating patterns
Efficiency on Large TextsPoorGood (With strong hash function)Excellent
Can Handle Multiple Patterns?NoYesNo
Best WhenPattern length is small, and text is short.Searching multiple patterns efficiently.Searching in large text files with repeated substrings.
Worst WhenLarge text with many mismatches.Hash collisions increase time.Text has many unique patterns, increasing LPS table size.
Practical Use CasesBasic text searching, log file scanning.Plagiarism detection, spam filtering, web crawling.DNA sequencing, large text searching, network security.

Key Takeaways:

  1. Naïve Algorithm:
    • Best for small texts but slow for large texts due to O(m × n) complexity.
    • No preprocessing, simple implementation.
  2. Rabin-Karp Algorithm:
    • Fast for multiple pattern searches but collisions can make it slow (O(m × n) in the worst case).
    • Hash function plays a crucial role in efficiency.
  3. KMP Algorithm:
    • Best for single pattern searches in large texts (O(n + m) time complexity).
    • Preprocessing (LPS table) allows efficient skipping of redundant comparisons.

If you need fast single-pattern search, KMP is the best choice.
If you need multi-pattern searching, Rabin-Karp is better.
If the text is small, Naïve Algorithm works fine.



🚨Thanks for visiting finenotes4u✨

Welcome to a hub for 😇Nerds and knowledge seekers! Here, you'll find everything you need to stay updated on education, notes, books, and daily trends.

💗 Bookmark our site to stay connected and never miss an update!

💌 Have suggestions or need more content? Drop a comment below, and let us know what topics you'd like to see next! Your support means the world to us. 😍

Post a Comment

Previous Post Next Post

Ad blocker detected!

We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.