Unit II: String Matching Algorithms and Computational Geometry

⭐Sequential Search and Brute-Force String Matching

1. Sequential Search

1.1 Introduction to Sequential Search

Definition:
- Sequential search is a simple searching algorithm where each element in a list or array is checked one by one until the target value is found or the end of the list is reached.
Concept:
- Start from the first element and compare it with the target value.
- If a match is found, return the position.
- If no match is found, move to the next element.
- Repeat until the target is found or the list ends.

1.2 Uses of Sequential Search

Unsorted Data Search:
- Works well when the list is not sorted, unlike Binary Search which requires sorted data.
Small Data Sets:
- Suitable when the number of elements is small (as performance is O(n)).
Finding Duplicates:
- Used when searching for multiple occurrences of a value in a list.
Linked Lists & Memory-based Data Structures:
- When elements aren't stored in contiguous memory (as in linked lists).
Pattern Matching:
- Used in text processing to find words or patterns in documents or files.

1.3 Types of Sequential Search

1.3.1 Linear Search (Iterative Approach)

The simplest method where each element is checked one by one.
Best case: O(1) (when the target is the first element).
Worst case: O(n) (when the target is the last element or not present).

Logic of Linear Search

Start from index 0.
Compare each element with the target value.
If a match is found, return the index.
If the list ends, return -1 (not found).

Example (Find ‘5’ in {1, 3, 5, 7, 9})

Compare 1 → No match.
Compare 3 → No match.
Compare 5 → Match found at index 2 ✅

1.3.2 Linear Search (Recursive Approach)

The same logic as iterative search, but implemented recursively.
The function calls itself until the element is found or the list ends.

Logic of Recursive Linear Search

If the list is empty, return -1 (not found).
Check the first element of the list.
If it matches the target, return the index.
Otherwise, search in the remaining list.

1.3.3 Sentinel Search (Optimized Sequential Search)

A small optimization of sequential search by placing the target at the end as a sentinel to eliminate boundary checks.

Logic of Sentinel Search

Store the last element and replace it with the target (sentinel).
Start searching from the first element.
If a match is found, return the index.
If the sentinel is reached, restore the original last element and return -1.

Example (Find ‘5’ in {1, 3, 5, 7, 9} using Sentinel)

Set sentinel at end: {1, 3, 5, 7, 5}
Start searching → Match found before reaching sentinel ✅

2. Brute-Force String Matching

2.1 Introduction to Brute-Force String Matching

Definition:
- A basic string searching algorithm that checks for a match character by character within a larger text.
Concept:
- Given a text T of length n and a pattern P of length m,
- Slide P across T one character at a time and check for a complete match.

2.2 Uses of Brute-Force String Matching

Basic Text Search:
- Searching for keywords in documents, log files, and emails.
Pattern Detection in DNA Sequences:
- Used in bioinformatics to find genetic patterns.
Spam Filtering:
- Detecting blacklisted words in emails.
Simple Applications where Performance is Not Critical:
- Works well in small datasets where efficiency is not a concern.

2.3 Steps of Brute-Force String Matching

Start at index 0 in the text.
Compare the first character of the pattern with the text.
If all characters match, return the index.
If a mismatch occurs, shift the pattern one position right.
Repeat until the end of the text is reached.

2.4 Example of Brute-Force String Matching

Example (Find "abc" in "ababcabc")

Compare "abc" with "aba" → ❌ Mismatch
Compare "abc" with "bab" → ❌ Mismatch
Compare "abc" with "abc" → ✅ Match found at position 2

2.5 Time Complexity of Brute-Force String Matching

Best Case: O(n) (if the pattern is found at the beginning).
Worst Case: O(m × n) (if every character has to be checked).

2.6 Optimizations of Brute-Force String Matching

2.6.1 Early Termination Optimization

If a mismatch occurs early, skip unnecessary comparisons.
Example:
- Searching "xyz" in "abcdef" → Stop at first character mismatch.

2.6.2 Use of Preprocessing

Precompute pattern character frequency to decide where to shift next.

2.6.3 Using Hashing (Similar to Rabin-Karp Algorithm)

Instead of comparing characters, use a hash function to quickly compare patterns.

3. Comparison Between Sequential Search & Brute-Force String Matching

Feature	Sequential Search	Brute-Force String Matching
Definition	Search an element in a list	Search a substring in a text
Best Case Complexity	O(1) (element at start)	O(n) (match at start)
Worst Case Complexity	O(n) (element not found)	O(m × n) (every comparison needed)
Data Type	Works with numbers/lists	Works with text/strings
Use Case	Searching in arrays/lists	Searching in documents, databases
Efficiency	Works well for small lists	Inefficient for large texts

⭐Closest-Pair and Convex-Hull Problem

1. Closest-Pair Problem

1.1 Introduction to Closest-Pair Problem

Definition:
- The Closest-Pair Problem finds the two closest points in a given set of points on a 2D plane.
- The distance between two points $(x_1, y_1)$ and $(x_2, y_2)$ is calculated using the Euclidean Distance Formula: $d = \sqrt{(x_{2} - x_{1})^{2} + (y_{2} - y_{1})^{2}}$
Concept:
- Given n points in a 2D plane, find two points that have the smallest distance between them.

1.2 Uses of Closest-Pair Problem

Geographical Mapping
- Used in GPS navigation and nearest location detection (e.g., "Find the nearest ATM").
Computer Vision
- Used in image recognition for detecting closest object clusters.
Clustering in Machine Learning
- Helps in grouping similar data points based on proximity.
Networking & Communication Systems
- Used in wireless networks to find the nearest access point for connection.
Astronomy & Physics Simulations
- Used in space studies to find the closest celestial bodies.

1.3 Approaches to Solve the Closest-Pair Problem

1.3.1 Brute-Force Method (O(n²))

Definition:
- The simplest method that compares every pair of points to find the closest one.
Steps:
1. Compare each point with every other point.
2. Calculate the Euclidean distance for each pair.
3. Keep track of the minimum distance found.

Example Explanation (Find closest pair among A(1, 1), B(4, 2), C(3, 5))

Compute distances:
- d(A, B) = 3.16
- d(A, C) = 4.47
- d(B, C) = 3.16
Closest pair = (A, B) or (B, C) with distance 3.16 ✅

Time Complexity: O(n²)
- Since every point is compared with all other points, the number of comparisons = $\frac{n(n-1)}{2}$ .

1.3.2 Divide and Conquer Method (O(n log n))

Definition:
- A more efficient approach that divides the set of points into two halves, finds the closest pair in each half, and merges the results.
Steps:
1. Sort the points based on x-coordinates.
2. Divide the points into two halves.
3. Find the closest pair in both halves recursively.
4. Merge step: Find the closest pair across the two halves.
5. Return the minimum distance found.

Example Explanation (Find closest pair among A(1, 1), B(4, 2), C(3, 5), D(7, 8))

Sort points by x-coordinates → {A(1,1), B(4,2), C(3,5), D(7,8)}
Divide into two halves: {A, B} and {C, D}
Find closest pairs in both halves separately.
Merge step: Check boundary points for a closer pair.

Time Complexity: O(n log n)
- Sorting takes O(n log n), and merging takes O(n).

2. Convex Hull Problem

2.1 Introduction to Convex Hull

Definition:
- A Convex Hull is the smallest convex polygon that can enclose a given set of points in a plane.
- It is like stretching a rubber band around the given points, and the points touching the band form the Convex Hull.

2.2 Uses of Convex Hull

Geographic Information Systems (GIS)
- Used to find boundaries of a city or country on a map.
Computer Graphics & Image Processing
- Used in object detection and shape recognition.
Collision Detection in Games
- Used in game development for detecting boundaries of objects.
Machine Learning & Data Clustering
- Used in defining data boundaries in unsupervised learning.
Robotics & Navigation
- Helps robots avoid obstacles by mapping their surroundings.

2.3 Approaches to Solve the Convex Hull Problem

2.3.1 Brute-Force Approach (O(n³))

Definition:
- A basic method where every pair of points is checked to see if it forms part of the hull.
Steps:
1. Consider all pairs of points and check if they form a hull edge.
2. A point is on the hull if all other points lie on one side of the line formed by the pair.
3. Collect all such boundary edges to form the convex hull.
Time Complexity: O(n³)
- Since each pair of points is checked with all other points.

2.3.2 Graham’s Scan Algorithm (O(n log n))

Definition:
- A more efficient method that sorts points based on angles and constructs the hull using a stack.
Steps:
1. Find the lowest point (P0) (smallest y-coordinate).
2. Sort remaining points based on the angle with P0.
3. Use a stack to construct the hull by checking left-turns and right-turns.

Example Explanation (Find Convex Hull for A(1,1), B(4,2), C(3,5), D(7,8))

Select lowest point → P0 = A(1,1)
Sort remaining points by angle → {A, B, C, D}
Use Stack:
- Add A to stack.
- Add B to stack.
- Add C to stack.
- If right-turn detected, remove last point from stack.

Time Complexity: O(n log n)
- Sorting takes O(n log n), and scanning takes O(n).

2.3.3 Jarvis March Algorithm (O(nh))

Definition:
- A method that selects points one by one by finding the leftmost extreme point.
Steps:
1. Start with leftmost point.
2. Select the next extreme point that forms the smallest left turn.
3. Repeat until the starting point is reached.

Example Explanation (Find Convex Hull for A(1,1), B(4,2), C(3,5), D(7,8))

Start with leftmost point (A).
Find next extreme point (B).
Find next extreme point (D).
Return to starting point to complete hull.

Time Complexity: O(nh)

h = number of hull points, making it efficient for small hulls.

⭐Exhaustive Search

1. Introduction to Exhaustive Search

Definition:
- Exhaustive search (also called brute-force search) is a problem-solving approach where all possible solutions are systematically explored to find the best or correct one.
- It is guaranteed to find the optimal solution but may be computationally expensive for large inputs.
Concept:
- Given a problem, try every possible combination and select the best one based on the given constraints.
- Used when no efficient algorithm exists or when ensuring optimality is necessary.

2. Uses of Exhaustive Search

Finding the Best Solution in Optimization Problems
- Example: Traveling Salesman Problem (TSP) – Checking all routes to find the shortest path.
Brute-Force Pattern Matching
- Example: Checking all possible positions of a substring within a text.
Breaking Cryptographic Algorithms
- Example: Brute-force password cracking by trying every combination.
Solving NP-Complete Problems
- Example: Graph Coloring, Knapsack Problem, and other combinatorial problems.
Path Finding in Games & AI
- Example: Finding the best move in Chess by evaluating all possibilities.

3. Types of Exhaustive Search

3.1 Complete Exhaustive Search

Tries all possible solutions and guarantees an optimal answer.
Example: Checking all paths in a maze to find the shortest one.

3.2 Partial Exhaustive Search

Tries a subset of solutions when full exploration is impractical.
Example: AI in Chess, where only likely good moves are explored.

3.3 Optimized Exhaustive Search

Applies pruning techniques to eliminate unlikely solutions.
Example: Branch and Bound, which skips unnecessary calculations.

4. Operations in Exhaustive Search

4.1 Generating All Possible Solutions

Definition:
- Systematically list all potential candidates that could be a solution.

Logic to Generate All Solutions

Identify the set of possible choices.
Construct a recursive or iterative approach to list all possible combinations.
Store each generated solution for evaluation.

Example (Generate all subsets of {1, 2, 3})

Possible subsets: { {}, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3} } ✅
Time Complexity: O(2ⁿ) (as each element can be included or excluded).

4.2 Checking Feasibility of Each Solution

Definition:
- After generating a solution, check whether it satisfies the problem’s constraints.

Logic to Check Feasibility

Verify that the solution does not violate problem constraints.
If invalid, discard the solution and proceed to the next.

Example (Knapsack Problem - Can a subset of weights sum to 10?)

Given {4, 6, 8, 3}, check:
- {4, 6} → 10 ✅
- {8, 3} → 11 ❌ (Discarded)
Time Complexity: O(2ⁿ) (Checking all subsets).

4.3 Selecting the Best Solution

Definition:
- After checking feasibility, pick the optimal solution based on the given criteria.

Logic to Select the Best Solution

Keep track of the best (or shortest, lowest cost, highest value) solution found so far.
Compare each new solution with the current best solution.
If better, update the best solution.

Example (Find the shortest route among all possible paths in a graph)

Paths: {A→B→C = 10, A→C→B = 8, A→B→D→C = 15}
Best solution: {A→C→B = 8} ✅
Time Complexity: O(n!) (as all paths are checked).

5. Example Problems Using Exhaustive Search

5.1 Traveling Salesman Problem (TSP)

Problem Statement:
- A salesman must visit n cities, traveling the shortest possible route while visiting each city exactly once and returning to the starting point.
Solution Using Exhaustive Search:
1. Generate all possible city visit orders.
2. Compute the total travel distance for each route.
3. Choose the shortest route.
Time Complexity: O(n!) (since all permutations of cities are checked).

Example (TSP for 3 cities: A, B, C)

Possible routes: {A→B→C→A, A→C→B→A, B→A→C→B, ...}
Compute distances for all routes.
Select the shortest route ✅.

5.2 Knapsack Problem (Subset-Sum Problem)

Problem Statement:
- Given n items with weights and values, find the subset that maximizes value without exceeding a given weight limit W.
Solution Using Exhaustive Search:
1. Generate all possible subsets of items.
2. Compute the total weight and value for each subset.
3. Select the subset with the highest value that satisfies the weight limit.
Time Complexity: O(2ⁿ).

Example (Items: {A(3kg, $6), B(2kg, $5), C(4kg, $8)}, Weight Limit = 5kg)

Possible subsets: { {}, {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C} }
Compute total weight and value for each subset.
Select the best subset within 5kg ✅.

5.3 Graph Coloring Problem

Problem Statement:
- Assign colors to graph vertices such that no two adjacent vertices share the same color, using the fewest colors possible.
Solution Using Exhaustive Search:
1. Try all color assignments for the vertices.
2. If a coloring violates adjacency constraints, discard it.
3. Choose the solution with the minimum colors.
Time Complexity: O(kⁿ) (where k is the number of colors).

Example (Graph with vertices A, B, C, and edges A-B, B-C, C-A)

Possible colorings:
- {A=Red, B=Green, C=Blue} ✅
- {A=Red, B=Red, C=Blue} ❌ (Invalid, A & B have the same color).
Select the valid coloring with the fewest colors ✅.

6. Limitations of Exhaustive Search

Exponential Growth
- In large datasets, checking all possibilities becomes impractical.
Computationally Expensive
- Problems like TSP and Knapsack take O(n!) or O(2ⁿ) time.
Inefficiency in Real-World Applications
- Other algorithms (e.g., Dynamic Programming, Greedy Algorithms) are faster and more efficient.

7. Optimizing Exhaustive Search

Backtracking: Eliminates unnecessary paths early (e.g., N-Queens Problem).
Branch and Bound: Cuts off suboptimal branches in optimization problems.
Heuristic Approaches: Uses approximation (e.g., Genetic Algorithms, Simulated Annealing).

⭐Voronoi Diagrams

1. Introduction to Voronoi Diagrams

Definition:
- A Voronoi diagram is a geometric data structure that partitions a plane into regions based on proximity to a given set of points.
- Each region contains all points that are closer to one specific seed point than any other.
Concept:
- Given a set of n points (called sites or generators) in a 2D plane, the plane is divided into n convex polygons where:
  - Each polygon contains points closest to a specific site.
  - The boundaries between regions are perpendicular bisectors of the segments between seed points.

Example Explanation

Suppose we have three points:
- A(1,3), B(4,5), C(6,2)
The Voronoi diagram consists of three regions, where:
- Region A contains all points closest to A.
- Region B contains all points closest to B.
- Region C contains all points closest to C.
The boundaries between these regions are straight lines equidistant from two seed points.

2. Uses of Voronoi Diagrams

Geographical Mapping & Nearest Neighbor Search
- Used in city planning, cell tower placement, and nearest hospital detection.
Computational Geometry & Artificial Intelligence
- Used in path planning for robots and AI-based decision-making.
Network Optimization & Wireless Communication
- Helps in optimal placement of cell towers to maximize coverage.
Medical Imaging & Biology
- Used in analyzing tissue structures and cell distribution.
Computer Graphics & Game Development
- Used in procedural terrain generation and AI navigation systems.
Astronomy & Physics Simulations
- Used for modeling galaxy distributions and crystal structures.

3. Types of Voronoi Diagrams

3.1 Ordinary Voronoi Diagram (Basic Type)

Each point has one region where all points are closest to it.

3.2 Weighted Voronoi Diagram

Some points have higher influence (priority) and expand their regions accordingly.
Used in meteorology for rain and wind distribution modeling.

3.3 Additively Weighted Voronoi Diagram (Power Diagram)

The distance is modified by an additive weight, changing region sizes.

3.4 Multiplicatively Weighted Voronoi Diagram

Each site has a scaling factor, adjusting distance computations.
Used in economic models where regions represent market dominance.

3.5 K-Order Voronoi Diagram

Instead of finding the closest site, this finds the k nearest sites to each point.
Used in logistics and emergency service planning.

3.6 Farthest-Point Voronoi Diagram

Instead of dividing space by closest points, it partitions based on farthest points.
Used in evacuation route planning.

4. Construction of Voronoi Diagrams

4.1 Brute-Force Construction (O(n²))

Definition:
- The simplest approach that checks the distance of every point to all given sites.
Steps:
1. For each pixel or grid point in the plane, compute its distance to all given sites.
2. Assign the point to the closest site.
3. The boundaries where distances are equal form Voronoi edges.
Time Complexity: O(n²)
- Since every point checks its distance to all other points.

4.2 Fortune’s Algorithm (O(n log n))

Definition:
- A sweep-line algorithm that constructs Voronoi diagrams efficiently in O(n log n) time.
Steps:
1. Sort all points by their y-coordinates.
2. Use a parabolic sweep-line from top to bottom to maintain a dynamic beach line.
3. When the sweep-line encounters a site, insert an arc into the beach line.
4. When it encounters a Voronoi edge intersection, remove an arc.
5. Store the boundary points to construct the final Voronoi diagram.
Time Complexity: O(n log n)
- Sorting takes O(n log n), and maintaining the beach line takes O(n).

Example Explanation

Given points A(2,3), B(5,7), C(8,2):
1. Start sweeping top to bottom.
2. Maintain a parabolic beach line for active sites.
3. Insert Voronoi boundaries at equal-distance intersections.
4. Continue until the entire plane is partitioned.

4.3 Incremental Algorithm (O(n log n))

Definition:
- Adds one site at a time and updates the Voronoi diagram incrementally.
Steps:
1. Start with a single site.
2. Insert new points one by one.
3. Adjust Voronoi boundaries dynamically.
4. Continue until all points are added.
Time Complexity: O(n log n)
- Since each insertion requires local adjustments in O(log n) time.

5. Operations in Voronoi Diagrams

5.1 Finding Nearest Neighbor (Point Location Query)

Definition:
- Given a query point, determine which Voronoi region it belongs to.
Logic:
1. Compute the distance of the query point to each site.
2. Find the minimum distance and assign the point to that region.
Time Complexity: O(log n) (using a spatial search structure).

Example Explanation (Find nearest point to Q(3,4))

Given sites A(1,1), B(5,5), C(7,3),
Compute distances:
- d(Q, A) = 3.6
- d(Q, B) = 2.2 ✅ (Nearest)
- d(Q, C) = 4.1
Point Q belongs to Voronoi region of B ✅.

5.2 Voronoi Edge Detection

Definition:
- Finds boundaries between Voronoi regions.
Logic:
1. Compute perpendicular bisectors between each pair of sites.
2. Find their intersection points.
3. Store these points as Voronoi edges.
Time Complexity: O(n log n)

5.3 Merging Two Voronoi Diagrams

Definition:
- Combines two separate Voronoi diagrams into one.
Logic:
1. Compute Voronoi diagrams independently.
2. Overlay them on the same plane.
3. Merge edges and resolve intersections.
Time Complexity: O(n log n)

6. Limitations of Voronoi Diagrams

Sensitive to Input Data
- Small changes in site positions drastically alter Voronoi regions.
Computationally Expensive for Large Data Sets
- O(n log n) complexity still grows large for millions of sites.
Not Suitable for Dynamic Environments

Requires recomputation if points are frequently added/removed.

⭐Naïve String-Matching Algorithm

1. Introduction to Naïve String-Matching Algorithm

Definition:
- The Naïve String-Matching Algorithm is a basic pattern searching algorithm that checks for a match character by character within a given text.
- It slides the pattern one character at a time over the text and compares each substring with the pattern.
Concept:
- Given a text T of length n and a pattern P of length m, the algorithm compares P with every substring of T.
- If a match is found, return the index of occurrence.

Example Explanation

Given:
- Text (T) = "ABABABCABABABC"
- Pattern (P) = "ABABC"
The algorithm starts comparing the pattern with each substring in the text:
- Compare "ABABC" with "ABABA" → Mismatch (shift right)
- Compare "ABABC" with "BABAB" → Mismatch (shift right)
- Compare "ABABC" with "ABABC" → Match found at index 2 ✅

2. Uses of Naïve String-Matching Algorithm

Text Searching & Information Retrieval
- Finding words in documents (e.g., searching for "error" in a log file).
Pattern Recognition in DNA Sequences
- Searching for genetic patterns in DNA sequences.
Spam Detection & Text Filtering
- Detecting blacklisted words in emails.
String Searching in Databases
- Locating specific records in large datasets.
Search Operations in Text Editors
- Find and replace functionalities in word processors.

3. Working Principle of Naïve String-Matching Algorithm

3.1 Steps of Naïve String Matching

Start with index 0 in the text.
Compare the first character of the pattern with the current character in the text.
If a character mismatch occurs, shift the pattern one step to the right.
If all characters match, return the index of occurrence.
Repeat until the end of the text is reached.

3.2 Example of Naïve String-Matching Algorithm

Example (Find "AB" in "ABACABAB")

Step-by-step matching process:

Compare "AB" with "AB" → ✅ Match found at index 0.
Shift right and compare "AB" with "BA" → ❌ Mismatch.
Shift right and compare "AB" with "AC" → ❌ Mismatch.
Shift right and compare "AB" with "CA" → ❌ Mismatch.
Shift right and compare "AB" with "AB" → ✅ Match found at index 5.

Final Output: Pattern occurs at indices {0, 5} ✅

4. Time Complexity of Naïve String-Matching Algorithm

Best Case: O(n)
- Occurs when the pattern matches in the first few comparisons or the first letter of the pattern rarely appears in the text.
Worst Case: O(m × n)
- Occurs when every character has to be checked (e.g., "AAAAA" searching for "AAA").

5. Optimizations to Naïve String Matching

5.1 Early Termination Optimization

If a mismatch occurs early, skip unnecessary comparisons and shift right.
Example:
- Searching "XYZ" in "ABCDEF" → Stop at first mismatch.

5.2 Using Frequency Analysis

If the last character of the pattern is rare in the text, shift multiple steps at once.
Example:
- Searching "ZZZ" in "ABCDEFGH" → Shift immediately after first mismatch.

5.3 Use of Hashing (Similar to Rabin-Karp Algorithm)

Instead of comparing character by character, use a hash function to quickly compare patterns.

6. Advantages & Disadvantages of Naïve String-Matching Algorithm

Advantages

✔ Simple & Easy to Implement
✔ Works well for small texts & patterns
✔ No extra memory needed (compared to KMP)

Disadvantages

✖ Inefficient for Large Texts (O(m × n) worst case)
✖ Repeated Comparisons increase time complexity
✖ Not practical for long pattern searches

⭐Rabin-Karp Algorithm

1. Introduction to Rabin-Karp Algorithm

Definition:
- The Rabin-Karp Algorithm is a string-searching algorithm that uses hashing to efficiently search for a pattern in a text.
- Instead of comparing characters one by one like the Naïve approach, it computes a hash value for the pattern and substrings of the text.
- If the hash values match, it performs a character-by-character check to confirm a valid match.
Concept:
- Given a text T of length n and a pattern P of length m, the algorithm:
  1. Computes the hash of P (pattern).
  2. Computes the hash of the first m characters of T (substring).
  3. Slides the pattern over the text by one position at a time, updating the hash value dynamically.
  4. If the hash matches, performs a character-by-character check to confirm.

2. Uses of Rabin-Karp Algorithm

String Searching in Large Texts
- Used in search engines to match queries quickly.
Detecting Plagiarism
- Used to compare chunks of text across documents.
Pattern Matching in DNA Sequences
- Used in bioinformatics to locate gene sequences.
Spam Filtering
- Identifies spam messages based on word patterns.
Searching Multiple Patterns in a Single Pass
- Unlike Naïve and KMP algorithms, Rabin-Karp can search multiple patterns at once.

3. Working Principle of Rabin-Karp Algorithm

3.1 Steps of Rabin-Karp Algorithm

Compute the hash of the pattern.
Compute the hash of the first substring of the text.
Slide the window over the text one character at a time, updating the hash dynamically.
If the hash matches, perform a character-by-character check to confirm the match.
Continue until the entire text is scanned.

3.2 Example of Rabin-Karp Algorithm

Example (Find "AB" in "ABACABAB")

Given:
- Text (T) = "ABACABAB"
- Pattern (P) = "AB"
Step-by-step process:

Compute the hash of "AB" → Hash(P) = H1
Compute the hash of first two characters in T ("AB") → Hash(T[0:2]) = H2
Compare H1 and H2:
- If H1 ≠ H2, shift pattern right and recompute hash.
- If H1 = H2, do a character-by-character check.
Repeat until the end of the text.

Final Output: Pattern occurs at indices {0, 5} ✅

4. Hash Function in Rabin-Karp Algorithm

Definition:
- A hash function is used to convert a string into a numerical value for quick comparisons.
Rolling Hash Technique:
- Instead of recalculating the entire hash from scratch for each shift, use a rolling hash function to update the hash in O(1) time.

Rolling Hash Formula (Modular Arithmetic)

H_{n e w} = (d \times (H_{o l d} - first character \times d^{m - 1}) + new character) m o d q

Where:

d = base (e.g., 256 for ASCII characters).
m = length of the pattern.
q = a prime number to reduce hash collisions.

Example Explanation (Sliding Hash Update)

Assume P = "AB" and T = "ABACABAB".
Compute hash of "AB" (H1).
Compute hash of "BA" (H2) by removing the first character and adding the next one.
Compute hash of "AC" (H3) similarly.
Use modulo operation (% q) to keep values small and reduce collisions.

5. Time Complexity of Rabin-Karp Algorithm

Best Case: O(n)
- Occurs when there are few hash collisions, requiring fewer character comparisons.
Worst Case: O(m × n)
- Occurs when hash collisions happen frequently, leading to multiple character comparisons.

6. Optimizations to Rabin-Karp Algorithm

6.1 Using a Good Hash Function

A prime number (q) helps reduce hash collisions.
Using higher base values (d) minimizes repeated hash values.

6.2 Avoiding Hash Collisions

If multiple substrings have the same hash, do a full character comparison before confirming a match.
Increase q to a larger prime number to reduce duplicate hash values.

6.3 Searching for Multiple Patterns Simultaneously

Store hashes of multiple patterns and compare them in a single pass.
Useful in plagiarism detection and spam filtering.

7. Advantages & Disadvantages of Rabin-Karp Algorithm

Advantages

✔ Efficient for multiple pattern searching
✔ Faster than Naïve algorithm in best case (O(n))
✔ Can be used in large-scale applications like web search engines

Disadvantages

✖ Hash collisions can cause O(m × n) worst-case time complexity
✖ Hash calculations require additional computation
✖ Not as fast as KMP in single-pattern search

⭐Knuth-Morris-Pratt (KMP) Algorithm
1. Introduction to Knuth-Morris-Pratt (KMP) Algorithm
Definition:
The Knuth-Morris-Pratt (KMP) Algorithm is a string-searching algorithm that efficiently finds occurrences of a pattern (P) in a text (T) using preprocessing to skip unnecessary comparisons.
Unlike Naïve String Matching, KMP avoids redundant re-evaluations by utilizing a precomputed table (LPS - Longest Prefix Suffix Table) to determine shifts.
Concept:
Instead of moving back in the text upon a mismatch, KMP uses the LPS table to determine how much the pattern should be shifted forward.
This makes the search process more efficient, especially for long texts.
Example Explanation
Given:
Text (T) = "ABABABCABABABC"
Pattern (P) = "ABABC"
Steps in KMP:
Precompute the LPS table for the pattern "ABABC".
Begin searching the pattern in the text.
If a mismatch occurs, use the LPS table to determine how much to shift instead of restarting from scratch.
Continue the process until the pattern is found or the text ends.
2. Uses of Knuth-Morris-Pratt (KMP) Algorithm
Fast String Searching in Large Texts
Used in search engines, file searching, and text editors.
DNA Pattern Matching
Used in bioinformatics to match DNA sequences efficiently.
Spam Filtering & Web Crawling
Used in detecting keywords in emails and parsing webpages.
Intrusion Detection Systems
Detects malware signatures by checking for specific bit patterns.
Network Security & Packet Filtering
Used to scan network traffic for specific attack signatures.
3. Working Principle of KMP Algorithm
3.1 Steps of KMP Algorithm
Precompute the LPS (Longest Prefix Suffix) Table for the pattern.
Start comparing the pattern with the text from left to right.
If a mismatch occurs, use the LPS table to skip unnecessary comparisons.
Continue until the entire text is searched.
4. LPS (Longest Prefix Suffix) Table
4.1 What is the LPS Table?
The LPS table stores the length of the longest proper prefix that is also a suffix for each prefix of the pattern.
It helps in determining the shift upon a mismatch.
4.2 Steps to Construct the LPS Table
Initialize an array LPS of size m (length of the pattern).
Set LPS[0] = 0 (since a single character cannot have a prefix-suffix match).
Compare prefixes and suffixes for each substring of the pattern.
If a match is found, increment the LPS value.
If a mismatch occurs, use the previous LPS value to decide the shift.
4.3 Example of LPS Table Construction
Example (Pattern: "ABABC")
Index Pattern LPS Value Explanation
0 A 0 No proper prefix == suffix
1 AB 0 No proper prefix == suffix
2 ABA 1 "A" is prefix & suffix
3 ABAB 2 "AB" is prefix & suffix
4 ABABC 0 "C" breaks the match
Final LPS Table: [0, 0, 1, 2, 0] ✅
5. Searching for the Pattern in the Text using KMP
5.1 Steps for Pattern Matching
Initialize pointers: i for text, j for pattern.
If T[i] == P[j], increment both i and j.
If j == length of pattern (m), a match is found, shift using LPS.
If T[i] != P[j] and j > 0, use LPS[j-1] to determine how much to shift.
If T[i] != P[j] and j == 0, move i forward.
5.2 Example of Pattern Search Using KMP
Example (Find "ABABC" in "ABABABCABABABC")
Compute LPS Table: [0, 0, 1, 2, 0].
Begin Searching:
"ABABC" matches "ABABC" in the text at index 2.
Mismatch at "C", shift pattern using LPS value.
Continue scanning → Another match found at index 9 ✅.
6. Time Complexity of KMP Algorithm
Preprocessing LPS Table: O(m)
Pattern Search: O(n)
Total Complexity: O(n + m)
Faster than Naïve Algorithm (O(m × n)) since it skips unnecessary comparisons.
7. Optimizations to KMP Algorithm
7.1 Using a Better LPS Construction
Avoid redundant calculations while computing LPS values.
Use a failure function to efficiently update LPS values.
7.2 Skipping Redundant Comparisons
Modify i and j indices based on LPS table to reduce unnecessary checks.
7.3 Parallelizing KMP
Divide the text into multiple segments and search in parallel for improved performance.
8. Advantages & Disadvantages of KMP Algorithm
Advantages
✔ Faster than Naïve Algorithm (O(n + m) instead of O(m × n))
✔ Efficient for long texts & repeating patterns
✔ Does not require extra space like Rabin-Karp
Disadvantages
✖ Preprocessing overhead (LPS table computation takes extra time)
✖ Not ideal for multiple pattern searching (Rabin-Karp is better for that)
✖ Performance similar to Naïve Algorithm in best case

⭐Naïve Algorithm vs. Rabin-Karp Algorithm vs. KMP Algorithm

Feature Naïve Algorithm Rabin-Karp Algorithm Knuth-Morris-Pratt (KMP) Algorithm
Definition Checks for a match character by character at every position in the text. Uses hashing to compare substrings efficiently. Uses preprocessing (LPS table) to skip unnecessary comparisons.
Best Case Time Complexity O(n) (Pattern found at start) O(n) (No hash collisions) O(n) (Few mismatches)
Worst Case Time Complexity O(m × n) (Every character checked) O(m × n) (Frequent hash collisions) O(n + m) (Preprocessing + Search)
Preprocessing Required? No Yes (Compute hash for pattern) Yes (Compute LPS table)
Space Complexity O(1) (No extra memory needed) O(1) (Extra space for hash values) O(m) (For LPS table)
Uses Hashing? No Yes No
Uses Precomputed Table? No No Yes (LPS Table)
Works Well for Small texts Searching multiple patterns at once Searching long repeating patterns
Efficiency on Large Texts Poor Good (With strong hash function) Excellent
Can Handle Multiple Patterns? No Yes No
Best When Pattern length is small, and text is short. Searching multiple patterns efficiently. Searching in large text files with repeated substrings.
Worst When Large text with many mismatches. Hash collisions increase time. Text has many unique patterns, increasing LPS table size.
Practical Use Cases Basic text searching, log file scanning. Plagiarism detection, spam filtering, web crawling. DNA sequencing, large text searching, network security.
Key Takeaways:
Naïve Algorithm:
Best for small texts but slow for large texts due to O(m × n) complexity.
No preprocessing, simple implementation.
Rabin-Karp Algorithm:
Fast for multiple pattern searches but collisions can make it slow (O(m × n) in the worst case).
Hash function plays a crucial role in efficiency.
KMP Algorithm:
Best for single pattern searches in large texts (O(n + m) time complexity).
Preprocessing (LPS table) allows efficient skipping of redundant comparisons.
If you need fast single-pattern search, KMP is the best choice.
If you need multi-pattern searching, Rabin-Karp is better.
If the text is small, Naïve Algorithm works fine.

🚨Thanks for visiting finenotes4u✨
Welcome to a hub for 😇Nerds and knowledge seekers! Here, you'll find everything you need to stay updated on education, notes, books, and daily trends.
💗 Bookmark our site to stay connected and never miss an update!
💌 Have suggestions or need more content? Drop a comment below, and let us know what topics you'd like to see next! Your support means the world to us. 😍

Index	Pattern	LPS Value	Explanation
0	A	0	No proper prefix == suffix
1	AB	0	No proper prefix == suffix
2	ABA	1	"A" is prefix & suffix
3	ABAB	2	"AB" is prefix & suffix
4	ABABC	0	"C" breaks the match

Feature	Naïve Algorithm	Rabin-Karp Algorithm	Knuth-Morris-Pratt (KMP) Algorithm
Definition	Checks for a match character by character at every position in the text.	Uses hashing to compare substrings efficiently.	Uses preprocessing (LPS table) to skip unnecessary comparisons.
Best Case Time Complexity	O(n) (Pattern found at start)	O(n) (No hash collisions)	O(n) (Few mismatches)
Worst Case Time Complexity	O(m × n) (Every character checked)	O(m × n) (Frequent hash collisions)	O(n + m) (Preprocessing + Search)
Preprocessing Required?	No	Yes (Compute hash for pattern)	Yes (Compute LPS table)
Space Complexity	O(1) (No extra memory needed)	O(1) (Extra space for hash values)	O(m) (For LPS table)
Uses Hashing?	No	Yes	No
Uses Precomputed Table?	No	No	Yes (LPS Table)
Works Well for	Small texts	Searching multiple patterns at once	Searching long repeating patterns
Efficiency on Large Texts	Poor	Good (With strong hash function)	Excellent
Can Handle Multiple Patterns?	No	Yes	No
Best When	Pattern length is small, and text is short.	Searching multiple patterns efficiently.	Searching in large text files with repeated substrings.
Worst When	Large text with many mismatches.	Hash collisions increase time.	Text has many unique patterns, increasing LPS table size.
Practical Use Cases	Basic text searching, log file scanning.	Plagiarism detection, spam filtering, web crawling.	DNA sequencing, large text searching, network security.

Top News

Unit II: String Matching Algorithms and Computational Geometry - CSE408 Design and Analysis of Algorithms | B.Tech CSE Notes PDF | FineNotes4U

Unit II: String Matching Algorithms and Computational Geometry

⭐Sequential Search and Brute-Force String Matching

1. Sequential Search

1.1 Introduction to Sequential Search

1.2 Uses of Sequential Search

1.3 Types of Sequential Search

2. Brute-Force String Matching

2.1 Introduction to Brute-Force String Matching

2.2 Uses of Brute-Force String Matching

2.3 Steps of Brute-Force String Matching

2.4 Example of Brute-Force String Matching

2.5 Time Complexity of Brute-Force String Matching

2.6 Optimizations of Brute-Force String Matching

3. Comparison Between Sequential Search & Brute-Force String Matching

⭐Closest-Pair and Convex-Hull Problem

1. Closest-Pair Problem

1.1 Introduction to Closest-Pair Problem

1.2 Uses of Closest-Pair Problem

1.3 Approaches to Solve the Closest-Pair Problem

2. Convex Hull Problem

2.1 Introduction to Convex Hull

2.2 Uses of Convex Hull

2.3 Approaches to Solve the Convex Hull Problem

⭐Exhaustive Search

1. Introduction to Exhaustive Search

2. Uses of Exhaustive Search

3. Types of Exhaustive Search

3.1 Complete Exhaustive Search

3.2 Partial Exhaustive Search

3.3 Optimized Exhaustive Search

4. Operations in Exhaustive Search

4.1 Generating All Possible Solutions

4.2 Checking Feasibility of Each Solution

4.3 Selecting the Best Solution

5. Example Problems Using Exhaustive Search

5.1 Traveling Salesman Problem (TSP)

5.2 Knapsack Problem (Subset-Sum Problem)

5.3 Graph Coloring Problem

6. Limitations of Exhaustive Search

7. Optimizing Exhaustive Search

⭐Voronoi Diagrams

1. Introduction to Voronoi Diagrams

Example Explanation

2. Uses of Voronoi Diagrams

3. Types of Voronoi Diagrams

3.1 Ordinary Voronoi Diagram (Basic Type)

3.2 Weighted Voronoi Diagram

3.3 Additively Weighted Voronoi Diagram (Power Diagram)

3.4 Multiplicatively Weighted Voronoi Diagram

3.5 K-Order Voronoi Diagram

3.6 Farthest-Point Voronoi Diagram

4. Construction of Voronoi Diagrams

4.1 Brute-Force Construction (O(n²))

4.2 Fortune’s Algorithm (O(n log n))

4.3 Incremental Algorithm (O(n log n))

5. Operations in Voronoi Diagrams

5.1 Finding Nearest Neighbor (Point Location Query)

5.2 Voronoi Edge Detection

5.3 Merging Two Voronoi Diagrams

6. Limitations of Voronoi Diagrams

⭐Naïve String-Matching Algorithm

1. Introduction to Naïve String-Matching Algorithm

Example Explanation

2. Uses of Naïve String-Matching Algorithm

3. Working Principle of Naïve String-Matching Algorithm

3.1 Steps of Naïve String Matching

3.2 Example of Naïve String-Matching Algorithm

Example (Find "AB" in "ABACABAB")

4. Time Complexity of Naïve String-Matching Algorithm

5. Optimizations to Naïve String Matching

5.1 Early Termination Optimization

5.2 Using Frequency Analysis

5.3 Use of Hashing (Similar to Rabin-Karp Algorithm)

6. Advantages & Disadvantages of Naïve String-Matching Algorithm

Advantages

Disadvantages

⭐Rabin-Karp Algorithm

1. Introduction to Rabin-Karp Algorithm