Unit II: String Matching Algorithms and Computational Geometry
- ⭐Sequential Search and Brute-Force String Matching
- ⭐Closest-Pair and Convex-Hull Problem
- ⭐Exhaustive Search
- ⭐Voronoi Diagrams
- ⭐Naïve String-Matching Algorithm
- 1. Introduction to Naïve String-Matching Algorithm
- 2. Uses of Naïve String-Matching Algorithm
- 3. Working Principle of Naïve String-Matching Algorithm
- 4. Time Complexity of Naïve String-Matching Algorithm
- 5. Optimizations to Naïve String Matching
- 6. Advantages & Disadvantages of Naïve String-Matching Algorithm
- ⭐Rabin-Karp Algorithm
- ⭐Knuth-Morris-Pratt (KMP) Algorithm
- 1. Introduction to Knuth-Morris-Pratt (KMP) Algorithm
- 2. Uses of Knuth-Morris-Pratt (KMP) Algorithm
- 3. Working Principle of KMP Algorithm
- 4. LPS (Longest Prefix Suffix) Table
- 5. Searching for the Pattern in the Text using KMP
- 6. Time Complexity of KMP Algorithm
- 7. Optimizations to KMP Algorithm
- 8. Advantages & Disadvantages of KMP Algorithm
- ⭐Naïve Algorithm vs. Rabin-Karp Algorithm vs. KMP Algorithm
- 🚨Thanks for visiting finenotes4u✨
⭐Sequential Search and Brute-Force String Matching
1. Sequential Search
1.1 Introduction to Sequential Search
- Definition:
- Sequential search is a simple searching algorithm where each element in a list or array is checked one by one until the target value is found or the end of the list is reached.
- Concept:
- Start from the first element and compare it with the target value.
- If a match is found, return the position.
- If no match is found, move to the next element.
- Repeat until the target is found or the list ends.
1.2 Uses of Sequential Search
- Unsorted Data Search:
- Works well when the list is not sorted, unlike Binary Search which requires sorted data.
- Small Data Sets:
- Suitable when the number of elements is small (as performance is O(n)).
- Finding Duplicates:
- Used when searching for multiple occurrences of a value in a list.
- Linked Lists & Memory-based Data Structures:
- When elements aren't stored in contiguous memory (as in linked lists).
- Pattern Matching:
- Used in text processing to find words or patterns in documents or files.
1.3 Types of Sequential Search
- The simplest method where each element is checked one by one.
- Best case: O(1) (when the target is the first element).
- Worst case: O(n) (when the target is the last element or not present).
- Start from index 0.
- Compare each element with the target value.
- If a match is found, return the index.
- If the list ends, return -1 (not found).
{1, 3, 5, 7, 9}
)- Compare
1
→ No match. - Compare
3
→ No match. - Compare
5
→ Match found at index 2 ✅
- The same logic as iterative search, but implemented recursively.
- The function calls itself until the element is found or the list ends.
- If the list is empty, return -1 (not found).
- Check the first element of the list.
- If it matches the target, return the index.
- Otherwise, search in the remaining list.
- A small optimization of sequential search by placing the target at the end as a sentinel to eliminate boundary checks.
- Store the last element and replace it with the target (sentinel).
- Start searching from the first element.
- If a match is found, return the index.
- If the sentinel is reached, restore the original last element and return -1.
{1, 3, 5, 7, 9}
using Sentinel)- Set sentinel at end:
{1, 3, 5, 7, 5}
- Start searching → Match found before reaching sentinel ✅
2. Brute-Force String Matching
2.1 Introduction to Brute-Force String Matching
- Definition:
- A basic string searching algorithm that checks for a match character by character within a larger text.
- Concept:
- Given a text T of length
n
and a pattern P of lengthm
, - Slide
P
acrossT
one character at a time and check for a complete match.
- Given a text T of length
2.2 Uses of Brute-Force String Matching
- Basic Text Search:
- Searching for keywords in documents, log files, and emails.
- Pattern Detection in DNA Sequences:
- Used in bioinformatics to find genetic patterns.
- Spam Filtering:
- Detecting blacklisted words in emails.
- Simple Applications where Performance is Not Critical:
- Works well in small datasets where efficiency is not a concern.
2.3 Steps of Brute-Force String Matching
- Start at index 0 in the text.
- Compare the first character of the pattern with the text.
- If all characters match, return the index.
- If a mismatch occurs, shift the pattern one position right.
- Repeat until the end of the text is reached.
2.4 Example of Brute-Force String Matching
- Compare
"abc"
with"aba"
→ ❌ Mismatch - Compare
"abc"
with"bab"
→ ❌ Mismatch - Compare
"abc"
with"abc"
→ ✅ Match found at position 2
2.5 Time Complexity of Brute-Force String Matching
- Best Case: O(n) (if the pattern is found at the beginning).
- Worst Case: O(m × n) (if every character has to be checked).
2.6 Optimizations of Brute-Force String Matching
- If a mismatch occurs early, skip unnecessary comparisons.
- Example:
- Searching
"xyz"
in"abcdef"
→ Stop at first character mismatch.
- Searching
- Precompute pattern character frequency to decide where to shift next.
- Instead of comparing characters, use a hash function to quickly compare patterns.
3. Comparison Between Sequential Search & Brute-Force String Matching
Feature | Sequential Search | Brute-Force String Matching |
---|---|---|
Definition | Search an element in a list | Search a substring in a text |
Best Case Complexity | O(1) (element at start) | O(n) (match at start) |
Worst Case Complexity | O(n) (element not found) | O(m × n) (every comparison needed) |
Data Type | Works with numbers/lists | Works with text/strings |
Use Case | Searching in arrays/lists | Searching in documents, databases |
Efficiency | Works well for small lists | Inefficient for large texts |
⭐Closest-Pair and Convex-Hull Problem
1. Closest-Pair Problem
1.1 Introduction to Closest-Pair Problem
Definition:
- The Closest-Pair Problem finds the two closest points in a given set of points on a 2D plane.
- The distance between two points and is calculated using the Euclidean Distance Formula:
Concept:
- Given n points in a 2D plane, find two points that have the smallest distance between them.
1.2 Uses of Closest-Pair Problem
- Geographical Mapping
- Used in GPS navigation and nearest location detection (e.g., "Find the nearest ATM").
- Computer Vision
- Used in image recognition for detecting closest object clusters.
- Clustering in Machine Learning
- Helps in grouping similar data points based on proximity.
- Networking & Communication Systems
- Used in wireless networks to find the nearest access point for connection.
- Astronomy & Physics Simulations
- Used in space studies to find the closest celestial bodies.
1.3 Approaches to Solve the Closest-Pair Problem
- Definition:
- The simplest method that compares every pair of points to find the closest one.
- Steps:
- Compare each point with every other point.
- Calculate the Euclidean distance for each pair.
- Keep track of the minimum distance found.
- Compute distances:
- d(A, B) = 3.16
- d(A, C) = 4.47
- d(B, C) = 3.16
- Closest pair = (A, B) or (B, C) with distance 3.16 ✅
- Time Complexity: O(n²)
- Since every point is compared with all other points, the number of comparisons = .
- Definition:
- A more efficient approach that divides the set of points into two halves, finds the closest pair in each half, and merges the results.
- Steps:
- Sort the points based on x-coordinates.
- Divide the points into two halves.
- Find the closest pair in both halves recursively.
- Merge step: Find the closest pair across the two halves.
- Return the minimum distance found.
- Sort points by x-coordinates →
{A(1,1), B(4,2), C(3,5), D(7,8)}
- Divide into two halves:
{A, B}
and{C, D}
- Find closest pairs in both halves separately.
- Merge step: Check boundary points for a closer pair.
- Time Complexity: O(n log n)
- Sorting takes O(n log n), and merging takes O(n).
2. Convex Hull Problem
2.1 Introduction to Convex Hull
- Definition:
- A Convex Hull is the smallest convex polygon that can enclose a given set of points in a plane.
- It is like stretching a rubber band around the given points, and the points touching the band form the Convex Hull.
2.2 Uses of Convex Hull
- Geographic Information Systems (GIS)
- Used to find boundaries of a city or country on a map.
- Computer Graphics & Image Processing
- Used in object detection and shape recognition.
- Collision Detection in Games
- Used in game development for detecting boundaries of objects.
- Machine Learning & Data Clustering
- Used in defining data boundaries in unsupervised learning.
- Robotics & Navigation
- Helps robots avoid obstacles by mapping their surroundings.
2.3 Approaches to Solve the Convex Hull Problem
Definition:
- A basic method where every pair of points is checked to see if it forms part of the hull.
Steps:
- Consider all pairs of points and check if they form a hull edge.
- A point is on the hull if all other points lie on one side of the line formed by the pair.
- Collect all such boundary edges to form the convex hull.
Time Complexity: O(n³)
- Since each pair of points is checked with all other points.
- Definition:
- A more efficient method that sorts points based on angles and constructs the hull using a stack.
- Steps:
- Find the lowest point (P0) (smallest y-coordinate).
- Sort remaining points based on the angle with P0.
- Use a stack to construct the hull by checking left-turns and right-turns.
- Select lowest point → P0 = A(1,1)
- Sort remaining points by angle →
{A, B, C, D}
- Use Stack:
- Add
A
to stack. - Add
B
to stack. - Add
C
to stack. - If right-turn detected, remove last point from stack.
- Add
- Time Complexity: O(n log n)
- Sorting takes O(n log n), and scanning takes O(n).
- Definition:
- A method that selects points one by one by finding the leftmost extreme point.
- Steps:
- Start with leftmost point.
- Select the next extreme point that forms the smallest left turn.
- Repeat until the starting point is reached.
Example Explanation (Find Convex Hull for A(1,1), B(4,2), C(3,5), D(7,8))
- Start with leftmost point (A).
- Find next extreme point (B).
- Find next extreme point (D).
- Return to starting point to complete hull.
- Time Complexity: O(nh)
- h = number of hull points, making it efficient for small hulls.
⭐Exhaustive Search
1. Introduction to Exhaustive Search
Definition:
- Exhaustive search (also called brute-force search) is a problem-solving approach where all possible solutions are systematically explored to find the best or correct one.
- It is guaranteed to find the optimal solution but may be computationally expensive for large inputs.
Concept:
- Given a problem, try every possible combination and select the best one based on the given constraints.
- Used when no efficient algorithm exists or when ensuring optimality is necessary.
2. Uses of Exhaustive Search
- Finding the Best Solution in Optimization Problems
- Example: Traveling Salesman Problem (TSP) – Checking all routes to find the shortest path.
- Brute-Force Pattern Matching
- Example: Checking all possible positions of a substring within a text.
- Breaking Cryptographic Algorithms
- Example: Brute-force password cracking by trying every combination.
- Solving NP-Complete Problems
- Example: Graph Coloring, Knapsack Problem, and other combinatorial problems.
- Path Finding in Games & AI
- Example: Finding the best move in Chess by evaluating all possibilities.
3. Types of Exhaustive Search
3.1 Complete Exhaustive Search
- Tries all possible solutions and guarantees an optimal answer.
- Example: Checking all paths in a maze to find the shortest one.
3.2 Partial Exhaustive Search
- Tries a subset of solutions when full exploration is impractical.
- Example: AI in Chess, where only likely good moves are explored.
3.3 Optimized Exhaustive Search
- Applies pruning techniques to eliminate unlikely solutions.
- Example: Branch and Bound, which skips unnecessary calculations.
4. Operations in Exhaustive Search
4.1 Generating All Possible Solutions
- Definition:
- Systematically list all potential candidates that could be a solution.
Logic to Generate All Solutions
- Identify the set of possible choices.
- Construct a recursive or iterative approach to list all possible combinations.
- Store each generated solution for evaluation.
Example (Generate all subsets of {1, 2, 3})
Possible subsets:
{ {}, {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3} }
✅Time Complexity: O(2ⁿ) (as each element can be included or excluded).
4.2 Checking Feasibility of Each Solution
- Definition:
- After generating a solution, check whether it satisfies the problem’s constraints.
Logic to Check Feasibility
- Verify that the solution does not violate problem constraints.
- If invalid, discard the solution and proceed to the next.
Example (Knapsack Problem - Can a subset of weights sum to 10?)
Given
{4, 6, 8, 3}
, check:{4, 6} → 10 ✅
{8, 3} → 11 ❌
(Discarded)
Time Complexity: O(2ⁿ) (Checking all subsets).
4.3 Selecting the Best Solution
- Definition:
- After checking feasibility, pick the optimal solution based on the given criteria.
Logic to Select the Best Solution
- Keep track of the best (or shortest, lowest cost, highest value) solution found so far.
- Compare each new solution with the current best solution.
- If better, update the best solution.
Example (Find the shortest route among all possible paths in a graph)
Paths:
{A→B→C = 10, A→C→B = 8, A→B→D→C = 15}
Best solution:
{A→C→B = 8}
✅Time Complexity: O(n!) (as all paths are checked).
5. Example Problems Using Exhaustive Search
5.1 Traveling Salesman Problem (TSP)
- Problem Statement:
- A salesman must visit n cities, traveling the shortest possible route while visiting each city exactly once and returning to the starting point.
- Solution Using Exhaustive Search:
- Generate all possible city visit orders.
- Compute the total travel distance for each route.
- Choose the shortest route.
- Time Complexity: O(n!) (since all permutations of cities are checked).
Example (TSP for 3 cities: A, B, C)
- Possible routes:
{A→B→C→A, A→C→B→A, B→A→C→B, ...}
- Compute distances for all routes.
- Select the shortest route ✅.
5.2 Knapsack Problem (Subset-Sum Problem)
- Problem Statement:
- Given n items with weights and values, find the subset that maximizes value without exceeding a given weight limit W.
- Solution Using Exhaustive Search:
- Generate all possible subsets of items.
- Compute the total weight and value for each subset.
- Select the subset with the highest value that satisfies the weight limit.
- Time Complexity: O(2ⁿ).
Example (Items: {A(3kg, $6), B(2kg, $5), C(4kg, $8)}, Weight Limit = 5kg)
- Possible subsets:
{ {}, {A}, {B}, {C}, {A,B}, {A,C}, {B,C}, {A,B,C} }
- Compute total weight and value for each subset.
- Select the best subset within 5kg ✅.
5.3 Graph Coloring Problem
- Problem Statement:
- Assign colors to graph vertices such that no two adjacent vertices share the same color, using the fewest colors possible.
- Solution Using Exhaustive Search:
- Try all color assignments for the vertices.
- If a coloring violates adjacency constraints, discard it.
- Choose the solution with the minimum colors.
- Time Complexity: O(kⁿ) (where k is the number of colors).
Example (Graph with vertices A, B, C, and edges A-B, B-C, C-A)
- Possible colorings:
{A=Red, B=Green, C=Blue}
✅{A=Red, B=Red, C=Blue}
❌ (Invalid, A & B have the same color).
- Select the valid coloring with the fewest colors ✅.
6. Limitations of Exhaustive Search
- Exponential Growth
- In large datasets, checking all possibilities becomes impractical.
- Computationally Expensive
- Problems like TSP and Knapsack take O(n!) or O(2ⁿ) time.
- Inefficiency in Real-World Applications
- Other algorithms (e.g., Dynamic Programming, Greedy Algorithms) are faster and more efficient.
7. Optimizing Exhaustive Search
- Backtracking: Eliminates unnecessary paths early (e.g., N-Queens Problem).
- Branch and Bound: Cuts off suboptimal branches in optimization problems.
- Heuristic Approaches: Uses approximation (e.g., Genetic Algorithms, Simulated Annealing).
⭐Voronoi Diagrams
1. Introduction to Voronoi Diagrams
- Definition:
- A Voronoi diagram is a geometric data structure that partitions a plane into regions based on proximity to a given set of points.
- Each region contains all points that are closer to one specific seed point than any other.
- Concept:
- Given a set of n points (called sites or generators) in a 2D plane, the plane is divided into n convex polygons where:
- Each polygon contains points closest to a specific site.
- The boundaries between regions are perpendicular bisectors of the segments between seed points.
- Given a set of n points (called sites or generators) in a 2D plane, the plane is divided into n convex polygons where:
Example Explanation
- Suppose we have three points:
- A(1,3), B(4,5), C(6,2)
- The Voronoi diagram consists of three regions, where:
- Region A contains all points closest to A.
- Region B contains all points closest to B.
- Region C contains all points closest to C.
- The boundaries between these regions are straight lines equidistant from two seed points.
2. Uses of Voronoi Diagrams
- Geographical Mapping & Nearest Neighbor Search
- Used in city planning, cell tower placement, and nearest hospital detection.
- Computational Geometry & Artificial Intelligence
- Used in path planning for robots and AI-based decision-making.
- Network Optimization & Wireless Communication
- Helps in optimal placement of cell towers to maximize coverage.
- Medical Imaging & Biology
- Used in analyzing tissue structures and cell distribution.
- Computer Graphics & Game Development
- Used in procedural terrain generation and AI navigation systems.
- Astronomy & Physics Simulations
- Used for modeling galaxy distributions and crystal structures.
3. Types of Voronoi Diagrams
3.1 Ordinary Voronoi Diagram (Basic Type)
- Each point has one region where all points are closest to it.
3.2 Weighted Voronoi Diagram
- Some points have higher influence (priority) and expand their regions accordingly.
- Used in meteorology for rain and wind distribution modeling.
3.3 Additively Weighted Voronoi Diagram (Power Diagram)
- The distance is modified by an additive weight, changing region sizes.
3.4 Multiplicatively Weighted Voronoi Diagram
- Each site has a scaling factor, adjusting distance computations.
- Used in economic models where regions represent market dominance.
3.5 K-Order Voronoi Diagram
- Instead of finding the closest site, this finds the k nearest sites to each point.
- Used in logistics and emergency service planning.
3.6 Farthest-Point Voronoi Diagram
- Instead of dividing space by closest points, it partitions based on farthest points.
- Used in evacuation route planning.
4. Construction of Voronoi Diagrams
4.1 Brute-Force Construction (O(n²))
- Definition:
- The simplest approach that checks the distance of every point to all given sites.
- Steps:
- For each pixel or grid point in the plane, compute its distance to all given sites.
- Assign the point to the closest site.
- The boundaries where distances are equal form Voronoi edges.
- Time Complexity: O(n²)
- Since every point checks its distance to all other points.
4.2 Fortune’s Algorithm (O(n log n))
- Definition:
- A sweep-line algorithm that constructs Voronoi diagrams efficiently in O(n log n) time.
- Steps:
- Sort all points by their y-coordinates.
- Use a parabolic sweep-line from top to bottom to maintain a dynamic beach line.
- When the sweep-line encounters a site, insert an arc into the beach line.
- When it encounters a Voronoi edge intersection, remove an arc.
- Store the boundary points to construct the final Voronoi diagram.
- Time Complexity: O(n log n)
- Sorting takes O(n log n), and maintaining the beach line takes O(n).
Example Explanation
- Given points A(2,3), B(5,7), C(8,2):
- Start sweeping top to bottom.
- Maintain a parabolic beach line for active sites.
- Insert Voronoi boundaries at equal-distance intersections.
- Continue until the entire plane is partitioned.
4.3 Incremental Algorithm (O(n log n))
- Definition:
- Adds one site at a time and updates the Voronoi diagram incrementally.
- Steps:
- Start with a single site.
- Insert new points one by one.
- Adjust Voronoi boundaries dynamically.
- Continue until all points are added.
- Time Complexity: O(n log n)
- Since each insertion requires local adjustments in O(log n) time.
5. Operations in Voronoi Diagrams
5.1 Finding Nearest Neighbor (Point Location Query)
- Definition:
- Given a query point, determine which Voronoi region it belongs to.
- Logic:
- Compute the distance of the query point to each site.
- Find the minimum distance and assign the point to that region.
- Time Complexity: O(log n) (using a spatial search structure).
Example Explanation (Find nearest point to Q(3,4))
- Given sites A(1,1), B(5,5), C(7,3),
- Compute distances:
- d(Q, A) = 3.6
- d(Q, B) = 2.2 ✅ (Nearest)
- d(Q, C) = 4.1
- Point Q belongs to Voronoi region of B ✅.
5.2 Voronoi Edge Detection
- Definition:
- Finds boundaries between Voronoi regions.
- Logic:
- Compute perpendicular bisectors between each pair of sites.
- Find their intersection points.
- Store these points as Voronoi edges.
- Time Complexity: O(n log n)
5.3 Merging Two Voronoi Diagrams
- Definition:
- Combines two separate Voronoi diagrams into one.
- Logic:
- Compute Voronoi diagrams independently.
- Overlay them on the same plane.
- Merge edges and resolve intersections.
- Time Complexity: O(n log n)
6. Limitations of Voronoi Diagrams
- Sensitive to Input Data
- Small changes in site positions drastically alter Voronoi regions.
- Computationally Expensive for Large Data Sets
- O(n log n) complexity still grows large for millions of sites.
- Not Suitable for Dynamic Environments
- Requires recomputation if points are frequently added/removed.
⭐Naïve String-Matching Algorithm
1. Introduction to Naïve String-Matching Algorithm
- Definition:
- The Naïve String-Matching Algorithm is a basic pattern searching algorithm that checks for a match character by character within a given text.
- It slides the pattern one character at a time over the text and compares each substring with the pattern.
- Concept:
- Given a text T of length n and a pattern P of length m, the algorithm compares P with every substring of T.
- If a match is found, return the index of occurrence.
Example Explanation
- Given:
- Text (T) = "ABABABCABABABC"
- Pattern (P) = "ABABC"
- The algorithm starts comparing the pattern with each substring in the text:
- Compare
"ABABC"
with"ABABA"
→ Mismatch (shift right) - Compare
"ABABC"
with"BABAB"
→ Mismatch (shift right) - Compare
"ABABC"
with"ABABC"
→ Match found at index 2 ✅
- Compare
2. Uses of Naïve String-Matching Algorithm
- Text Searching & Information Retrieval
- Finding words in documents (e.g., searching for "error" in a log file).
- Pattern Recognition in DNA Sequences
- Searching for genetic patterns in DNA sequences.
- Spam Detection & Text Filtering
- Detecting blacklisted words in emails.
- String Searching in Databases
- Locating specific records in large datasets.
- Search Operations in Text Editors
- Find and replace functionalities in word processors.
3. Working Principle of Naïve String-Matching Algorithm
3.1 Steps of Naïve String Matching
- Start with index 0 in the text.
- Compare the first character of the pattern with the current character in the text.
- If a character mismatch occurs, shift the pattern one step to the right.
- If all characters match, return the index of occurrence.
- Repeat until the end of the text is reached.
3.2 Example of Naïve String-Matching Algorithm
Example (Find "AB" in "ABACABAB")
Step-by-step matching process:
- Compare
"AB"
with"AB"
→ ✅ Match found at index 0. - Shift right and compare
"AB"
with"BA"
→ ❌ Mismatch. - Shift right and compare
"AB"
with"AC"
→ ❌ Mismatch. - Shift right and compare
"AB"
with"CA"
→ ❌ Mismatch. - Shift right and compare
"AB"
with"AB"
→ ✅ Match found at index 5.
Final Output: Pattern occurs at indices {0, 5}
✅
4. Time Complexity of Naïve String-Matching Algorithm
- Best Case: O(n)
- Occurs when the pattern matches in the first few comparisons or the first letter of the pattern rarely appears in the text.
- Worst Case: O(m × n)
- Occurs when every character has to be checked (e.g.,
"AAAAA"
searching for"AAA"
).
- Occurs when every character has to be checked (e.g.,
5. Optimizations to Naïve String Matching
5.1 Early Termination Optimization
- If a mismatch occurs early, skip unnecessary comparisons and shift right.
- Example:
- Searching
"XYZ"
in"ABCDEF"
→ Stop at first mismatch.
- Searching
5.2 Using Frequency Analysis
- If the last character of the pattern is rare in the text, shift multiple steps at once.
- Example:
- Searching
"ZZZ"
in"ABCDEFGH"
→ Shift immediately after first mismatch.
- Searching
5.3 Use of Hashing (Similar to Rabin-Karp Algorithm)
- Instead of comparing character by character, use a hash function to quickly compare patterns.
6. Advantages & Disadvantages of Naïve String-Matching Algorithm
Advantages
✔ Simple & Easy to Implement
✔ Works well for small texts & patterns
✔ No extra memory needed (compared to KMP)
Disadvantages
✖ Inefficient for Large Texts (O(m × n) worst case)
✖ Repeated Comparisons increase time complexity
✖ Not practical for long pattern searches
⭐Rabin-Karp Algorithm
1. Introduction to Rabin-Karp Algorithm
Definition:
- The Rabin-Karp Algorithm is a string-searching algorithm that uses hashing to efficiently search for a pattern in a text.
- Instead of comparing characters one by one like the Naïve approach, it computes a hash value for the pattern and substrings of the text.
- If the hash values match, it performs a character-by-character check to confirm a valid match.
Concept:
- Given a text T of length n and a pattern P of length m, the algorithm:
- Computes the hash of P (pattern).
- Computes the hash of the first m characters of T (substring).
- Slides the pattern over the text by one position at a time, updating the hash value dynamically.
- If the hash matches, performs a character-by-character check to confirm.
- Given a text T of length n and a pattern P of length m, the algorithm:
2. Uses of Rabin-Karp Algorithm
- String Searching in Large Texts
- Used in search engines to match queries quickly.
- Detecting Plagiarism
- Used to compare chunks of text across documents.
- Pattern Matching in DNA Sequences
- Used in bioinformatics to locate gene sequences.
- Spam Filtering
- Identifies spam messages based on word patterns.
- Searching Multiple Patterns in a Single Pass
- Unlike Naïve and KMP algorithms, Rabin-Karp can search multiple patterns at once.
3. Working Principle of Rabin-Karp Algorithm
3.1 Steps of Rabin-Karp Algorithm
- Compute the hash of the pattern.
- Compute the hash of the first substring of the text.
- Slide the window over the text one character at a time, updating the hash dynamically.
- If the hash matches, perform a character-by-character check to confirm the match.
- Continue until the entire text is scanned.
3.2 Example of Rabin-Karp Algorithm
Example (Find "AB" in "ABACABAB")
- Given:
- Text (T) = "ABACABAB"
- Pattern (P) = "AB"
- Step-by-step process:
- Compute the hash of "AB" →
Hash(P) = H1
- Compute the hash of first two characters in T (
"AB"
) →Hash(T[0:2]) = H2
- Compare
H1
andH2
:- If H1 ≠ H2, shift pattern right and recompute hash.
- If H1 = H2, do a character-by-character check.
- Repeat until the end of the text.
Final Output: Pattern occurs at indices {0, 5}
✅
4. Hash Function in Rabin-Karp Algorithm
- Definition:
- A hash function is used to convert a string into a numerical value for quick comparisons.
- Rolling Hash Technique:
- Instead of recalculating the entire hash from scratch for each shift, use a rolling hash function to update the hash in O(1) time.
Rolling Hash Formula (Modular Arithmetic)
Where:
d
= base (e.g., 256 for ASCII characters).m
= length of the pattern.q
= a prime number to reduce hash collisions.
Example Explanation (Sliding Hash Update)
- Assume P = "AB" and T = "ABACABAB".
- Compute hash of
"AB"
(H1
). - Compute hash of
"BA"
(H2
) by removing the first character and adding the next one. - Compute hash of
"AC"
(H3
) similarly. - Use modulo operation (
% q
) to keep values small and reduce collisions.
5. Time Complexity of Rabin-Karp Algorithm
- Best Case: O(n)
- Occurs when there are few hash collisions, requiring fewer character comparisons.
- Worst Case: O(m × n)
- Occurs when hash collisions happen frequently, leading to multiple character comparisons.
6. Optimizations to Rabin-Karp Algorithm
6.1 Using a Good Hash Function
- A prime number (q) helps reduce hash collisions.
- Using higher base values (d) minimizes repeated hash values.
6.2 Avoiding Hash Collisions
- If multiple substrings have the same hash, do a full character comparison before confirming a match.
- Increase
q
to a larger prime number to reduce duplicate hash values.
6.3 Searching for Multiple Patterns Simultaneously
- Store hashes of multiple patterns and compare them in a single pass.
- Useful in plagiarism detection and spam filtering.
7. Advantages & Disadvantages of Rabin-Karp Algorithm
Advantages
✔ Efficient for multiple pattern searching
✔ Faster than Naïve algorithm in best case (O(n))
✔ Can be used in large-scale applications like web search engines
Disadvantages
✖ Hash collisions can cause O(m × n) worst-case time complexity
✖ Hash calculations require additional computation
✖ Not as fast as KMP in single-pattern search
⭐Knuth-Morris-Pratt (KMP) Algorithm
1. Introduction to Knuth-Morris-Pratt (KMP) Algorithm
Definition:
- The Knuth-Morris-Pratt (KMP) Algorithm is a string-searching algorithm that efficiently finds occurrences of a pattern (P) in a text (T) using preprocessing to skip unnecessary comparisons.
- Unlike Naïve String Matching, KMP avoids redundant re-evaluations by utilizing a precomputed table (LPS - Longest Prefix Suffix Table) to determine shifts.
Concept:
- Instead of moving back in the text upon a mismatch, KMP uses the LPS table to determine how much the pattern should be shifted forward.
- This makes the search process more efficient, especially for long texts.
Example Explanation
- Given:
- Text (T) = "ABABABCABABABC"
- Pattern (P) = "ABABC"
- Steps in KMP:
- Precompute the LPS table for the pattern
"ABABC"
. - Begin searching the pattern in the text.
- If a mismatch occurs, use the LPS table to determine how much to shift instead of restarting from scratch.
- Continue the process until the pattern is found or the text ends.
- Precompute the LPS table for the pattern
2. Uses of Knuth-Morris-Pratt (KMP) Algorithm
- Fast String Searching in Large Texts
- Used in search engines, file searching, and text editors.
- DNA Pattern Matching
- Used in bioinformatics to match DNA sequences efficiently.
- Spam Filtering & Web Crawling
- Used in detecting keywords in emails and parsing webpages.
- Intrusion Detection Systems
- Detects malware signatures by checking for specific bit patterns.
- Network Security & Packet Filtering
- Used to scan network traffic for specific attack signatures.
3. Working Principle of KMP Algorithm
3.1 Steps of KMP Algorithm
- Precompute the LPS (Longest Prefix Suffix) Table for the pattern.
- Start comparing the pattern with the text from left to right.
- If a mismatch occurs, use the LPS table to skip unnecessary comparisons.
- Continue until the entire text is searched.
4. LPS (Longest Prefix Suffix) Table
4.1 What is the LPS Table?
- The LPS table stores the length of the longest proper prefix that is also a suffix for each prefix of the pattern.
- It helps in determining the shift upon a mismatch.
4.2 Steps to Construct the LPS Table
- Initialize an array
LPS
of size m (length of the pattern). - Set
LPS[0] = 0
(since a single character cannot have a prefix-suffix match). - Compare prefixes and suffixes for each substring of the pattern.
- If a match is found, increment the LPS value.
- If a mismatch occurs, use the previous LPS value to decide the shift.
4.3 Example of LPS Table Construction
Example (Pattern: "ABABC")
Index | Pattern | LPS Value | Explanation |
---|---|---|---|
0 | A | 0 | No proper prefix == suffix |
1 | AB | 0 | No proper prefix == suffix |
2 | ABA | 1 | "A" is prefix & suffix |
3 | ABAB | 2 | "AB" is prefix & suffix |
4 | ABABC | 0 | "C" breaks the match |
Final LPS Table: [0, 0, 1, 2, 0]
✅
5. Searching for the Pattern in the Text using KMP
5.1 Steps for Pattern Matching
- Initialize pointers:
i
for text,j
for pattern. - If
T[i] == P[j]
, increment bothi
andj
. - If
j == length of pattern (m)
, a match is found, shift using LPS. - If
T[i] != P[j]
andj > 0
, useLPS[j-1]
to determine how much to shift. - If
T[i] != P[j]
andj == 0
, movei
forward.
5.2 Example of Pattern Search Using KMP
Example (Find "ABABC" in "ABABABCABABABC")
- Compute LPS Table:
[0, 0, 1, 2, 0]
. - Begin Searching:
"ABABC"
matches"ABABC"
in the text at index 2.- Mismatch at
"C"
, shift pattern using LPS value.
- Continue scanning → Another match found at index 9 ✅.
6. Time Complexity of KMP Algorithm
- Preprocessing LPS Table: O(m)
- Pattern Search: O(n)
- Total Complexity: O(n + m)
- Faster than Naïve Algorithm (O(m × n)) since it skips unnecessary comparisons.
7. Optimizations to KMP Algorithm
7.1 Using a Better LPS Construction
- Avoid redundant calculations while computing LPS values.
- Use a failure function to efficiently update LPS values.
7.2 Skipping Redundant Comparisons
- Modify
i
andj
indices based on LPS table to reduce unnecessary checks.
7.3 Parallelizing KMP
- Divide the text into multiple segments and search in parallel for improved performance.
8. Advantages & Disadvantages of KMP Algorithm
Advantages
✔ Faster than Naïve Algorithm (O(n + m) instead of O(m × n))
✔ Efficient for long texts & repeating patterns
✔ Does not require extra space like Rabin-Karp
Disadvantages
✖ Preprocessing overhead (LPS table computation takes extra time)
✖ Not ideal for multiple pattern searching (Rabin-Karp is better for that)
✖ Performance similar to Naïve Algorithm in best case
⭐Naïve Algorithm vs. Rabin-Karp Algorithm vs. KMP Algorithm
Feature | Naïve Algorithm | Rabin-Karp Algorithm | Knuth-Morris-Pratt (KMP) Algorithm |
---|---|---|---|
Definition | Checks for a match character by character at every position in the text. | Uses hashing to compare substrings efficiently. | Uses preprocessing (LPS table) to skip unnecessary comparisons. |
Best Case Time Complexity | O(n) (Pattern found at start) | O(n) (No hash collisions) | O(n) (Few mismatches) |
Worst Case Time Complexity | O(m × n) (Every character checked) | O(m × n) (Frequent hash collisions) | O(n + m) (Preprocessing + Search) |
Preprocessing Required? | No | Yes (Compute hash for pattern) | Yes (Compute LPS table) |
Space Complexity | O(1) (No extra memory needed) | O(1) (Extra space for hash values) | O(m) (For LPS table) |
Uses Hashing? | No | Yes | No |
Uses Precomputed Table? | No | No | Yes (LPS Table) |
Works Well for | Small texts | Searching multiple patterns at once | Searching long repeating patterns |
Efficiency on Large Texts | Poor | Good (With strong hash function) | Excellent |
Can Handle Multiple Patterns? | No | Yes | No |
Best When | Pattern length is small, and text is short. | Searching multiple patterns efficiently. | Searching in large text files with repeated substrings. |
Worst When | Large text with many mismatches. | Hash collisions increase time. | Text has many unique patterns, increasing LPS table size. |
Practical Use Cases | Basic text searching, log file scanning. | Plagiarism detection, spam filtering, web crawling. | DNA sequencing, large text searching, network security. |
Key Takeaways:
- Naïve Algorithm:
- Best for small texts but slow for large texts due to O(m × n) complexity.
- No preprocessing, simple implementation.
- Rabin-Karp Algorithm:
- Fast for multiple pattern searches but collisions can make it slow (O(m × n) in the worst case).
- Hash function plays a crucial role in efficiency.
- KMP Algorithm:
- Best for single pattern searches in large texts (O(n + m) time complexity).
- Preprocessing (LPS table) allows efficient skipping of redundant comparisons.
If you need fast single-pattern search, KMP is the best choice.
If you need multi-pattern searching, Rabin-Karp is better.
If the text is small, Naïve Algorithm works fine.
🚨Thanks for visiting finenotes4u✨
Welcome to a hub for 😇Nerds and knowledge seekers! Here, you'll find everything you need to stay updated on education, notes, books, and daily trends.
💗 Bookmark our site to stay connected and never miss an update!
💌 Have suggestions or need more content? Drop a comment below, and let us know what topics you'd like to see next! Your support means the world to us. 😍