Algoritmo Rabin-Karp para pesquisa de padrões

Dado um texto txt [0..n-1] e um padrão pat [0..m-1] , escreva uma pesquisa de função (char pat [], char txt []) que imprime todas as ocorrências de pat [] em txt [] . Você pode assumir que n> m.

Exemplos:

Input:  txt[] = "THIS IS A TEST TEXT"
        pat[] = "TEST"
Output: Pattern found at index 10

Input:  txt[] =  "AABAACAADAABAABA"
        pat[] =  "AABA"
Output: Pattern found at index 0
        Pattern found at index 9
        Pattern found at index 12

procura de padrões

O algoritmo Naive String Matching desliza o padrão um por um. Após cada slide, ele verifica um por um os caracteres no turno atual e, se todos os caracteres correspondem, imprime a correspondência.
Como o algoritmo ingênuo, o algoritmo Rabin-Karp também desliza o padrão um por um. Mas, ao contrário do algoritmo Naive, o algoritmo Rabin Karp combina o valor hash do padrão com o valor hash da substring atual do texto e, se os valores hash corresponderem, apenas ele começa a corresponder a caracteres individuais. Portanto, o algoritmo de Rabin Karp precisa calcular os valores de hash para as sequências de caracteres.
1) Padrão em si.
2) Todas as substrings do texto de comprimento m.

Visto que precisamos calcular com eficiência os valores de hash para todas as substrings de tamanho m de texto, devemos ter uma função de hash com a seguinte propriedade.
O hash no próximo turno deve ser computável de forma eficiente a partir do valor de hash atual e próximo caractere no texto ou podemos dizer que o hash (txt [s + 1 .. s + m]) deve ser computável de forma eficiente a partir do hash (txt [s .. s + m-1]) e txt [s + m] ou seja, hash (txt [s + 1 .. s + m]) = rehash (txt [s + m], hash (txt [s .. s + m- 1])) e rehash deve ser operação O (1).
A função hash sugerida por Rabin e Karp calcula um valor inteiro. O valor inteiro de uma string é o valor numérico de uma string.

Por exemplo, se todos os caracteres possíveis forem de 1 a 10, o valor numérico de “122” será 122. O número de caracteres possíveis é maior do que 10 (256 em geral) e o comprimento do padrão pode ser grande. Portanto, os valores numéricos não podem ser armazenados praticamente como um número inteiro. Portanto, o valor numérico é calculado usando aritmética modular para garantir que os valores de hash possam ser armazenados em uma variável inteira (pode caber em palavras da memória). Para refazer o hash, precisamos retirar o dígito mais significativo e adicionar o novo dígito menos significativo para o valor do hash. A reformulação é feita usando a seguinte fórmula.

hash (txt [s + 1 .. s + m]) = (d (hash (txt [s .. s + m-1]) - txt [s] * h) + txt [s + m]) mod q
hash (txt [s .. s + m-1]) : Valor de hash em shift s .
hash (txt [s + 1 .. s + m]) : Valor de hash no próximo turno (ou shift s +1)
d : Número de caracteres no alfabeto
q : Um número primo
h: d ^ (m-1)

Como funciona a expressão acima?

Isso é matemática simples, calculamos o valor decimal da janela atual da janela anterior.
Por exemplo, o comprimento do padrão é 3 e a string é “23456”
Você calcula o valor da primeira janela (que é “234”) como 234.
Como você vai calcular o valor da próxima janela “345”? Você fará (234 - 2 * 100) * 10 + 5 e obterá 345.

/* Following program is a C++ implementation of Rabin Karp
Algorithm given in the CLRS book */
#include <bits/stdc++.h>
using namespace std;
 
// d is the number of characters in the input alphabet
#define d 256
 
/* pat -> pattern
    txt -> text
    q -> A prime number
*/
void search(char pat[], char txt[], int q)
{
    int M = strlen(pat);
    int N = strlen(txt);
    int i, j;
    int p = 0; // hash value for pattern
    int t = 0; // hash value for txt
    int h = 1;
 
    // The value of h would be "pow(d, M-1)%q"
    for (i = 0; i < M - 1; i++)
        h = (h * d) % q;
 
    // Calculate the hash value of pattern and first
    // window of text
    for (i = 0; i < M; i++)
    {
        p = (d * p + pat[i]) % q;
        t = (d * t + txt[i]) % q;
    }
 
    // Slide the pattern over text one by one
    for (i = 0; i <= N - M; i++)
    {
 
        // Check the hash values of current window of text
        // and pattern. If the hash values match then only
        // check for characters one by one
        if ( p == t )
        {  
            bool flag = true;
            /* Check for characters one by one */
            for (j = 0; j < M; j++)
            {
                if (txt[i+j] != pat[j])
                {
                  flag = false;
                  break;
                }
                  if(flag)
                  cout<<i<<" ";
                      
            }
 
            // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]
           
            if (j == M)
                cout<<"Pattern found at index "<< i<<endl;
        }
 
        // Calculate hash value for next window of text: Remove
        // leading digit, add trailing digit
        if ( i < N-M )
        {
            t = (d*(t - txt[i]*h) + txt[i+M])%q;
 
            // We might get negative value of t, converting it
            // to positive
            if (t < 0)
            t = (t + q);
        }
    }
}
 
/* Driver code */
int main()
{
    char txt[] = "GEEKS FOR GEEKS";
    char pat[] = "GEEK";
       
      // A prime number
    int q = 101;
     
      // Function Call
      search(pat, txt, q);
    return 0;
}
 
// This is code is contributed by rathbhupendra

/* Following program is a C implementation of Rabin Karp
Algorithm given in the CLRS book */
#include<stdio.h>
#include<string.h>
 
// d is the number of characters in the input alphabet
#define d 256
 
/* pat -> pattern
    txt -> text
    q -> A prime number
*/
void search(char pat[], char txt[], int q)
{
    int M = strlen(pat);
    int N = strlen(txt);
    int i, j;
    int p = 0; // hash value for pattern
    int t = 0; // hash value for txt
    int h = 1;
 
    // The value of h would be "pow(d, M-1)%q"
    for (i = 0; i < M-1; i++)
        h = (h*d)%q;
 
    // Calculate the hash value of pattern and first
    // window of text
    for (i = 0; i < M; i++)
    {
        p = (d*p + pat[i])%q;
        t = (d*t + txt[i])%q;
    }
 
    // Slide the pattern over text one by one
    for (i = 0; i <= N - M; i++)
    {
 
        // Check the hash values of current window of text
        // and pattern. If the hash values match then only
        // check for characters on by one
        if ( p == t )
        {
            /* Check for characters one by one */
            for (j = 0; j < M; j++)
            {
                if (txt[i+j] != pat[j])
                    break;
            }
 
            // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]
            if (j == M)
                printf("Pattern found at index %d \n", i);
        }
 
        // Calculate hash value for next window of text: Remove
        // leading digit, add trailing digit
        if ( i < N-M )
        {
            t = (d*(t - txt[i]*h) + txt[i+M])%q;
 
            // We might get negative value of t, converting it
            // to positive
            if (t < 0)
            t = (t + q);
        }
    }
}
 
/* Driver Code */
int main()
{
    char txt[] = "GEEKS FOR GEEKS";
    char pat[] = "GEEK";
   
      // A prime number
    int q = 101;
   
      // function call
    search(pat, txt, q);
    return 0;
}

// Following program is a Java implementation
// of Rabin Karp Algorithm given in the CLRS book
 
public class Main
{
    // d is the number of characters in the input alphabet
    public final static int d = 256;
     
    /* pat -> pattern
        txt -> text
        q -> A prime number
    */
    static void search(String pat, String txt, int q)
    {
        int M = pat.length();
        int N = txt.length();
        int i, j;
        int p = 0; // hash value for pattern
        int t = 0; // hash value for txt
        int h = 1;
     
        // The value of h would be "pow(d, M-1)%q"
        for (i = 0; i < M-1; i++)
            h = (h*d)%q;
     
        // Calculate the hash value of pattern and first
        // window of text
        for (i = 0; i < M; i++)
        {
            p = (d*p + pat.charAt(i))%q;
            t = (d*t + txt.charAt(i))%q;
        }
     
        // Slide the pattern over text one by one
        for (i = 0; i <= N - M; i++)
        {
     
            // Check the hash values of current window of text
            // and pattern. If the hash values match then only
            // check for characters on by one
            if ( p == t )
            {
                /* Check for characters one by one */
                for (j = 0; j < M; j++)
                {
                    if (txt.charAt(i+j) != pat.charAt(j))
                        break;
                }
     
                // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]
                if (j == M)
                    System.out.println("Pattern found at index " + i);
            }
     
            // Calculate hash value for next window of text: Remove
            // leading digit, add trailing digit
            if ( i < N-M )
            {
                t = (d*(t - txt.charAt(i)*h) + txt.charAt(i+M))%q;
     
                // We might get negative value of t, converting it
                // to positive
                if (t < 0)
                t = (t + q);
            }
        }
    }
     
    /* Driver Code */
    public static void main(String[] args)
    {
        String txt = "GEEKS FOR GEEKS";
        String pat = "GEEK";
           
          // A prime number
        int q = 101;
       
          // Function Call
        search(pat, txt, q);
    }
}
 
// This code is contributed by nuclode

# Following program is the python implementation of
# Rabin Karp Algorithm given in CLRS book
 
# d is the number of characters in the input alphabet
d = 256
 
# pat  -> pattern
# txt  -> text
# q    -> A prime number
 
def search(pat, txt, q):
    M = len(pat)
    N = len(txt)
    i = 0
    j = 0
    p = 0    # hash value for pattern
    t = 0    # hash value for txt
    h = 1
 
    # The value of h would be "pow(d, M-1)%q"
    for i in xrange(M-1):
        h = (h*d)%q
 
    # Calculate the hash value of pattern and first window
    # of text
    for i in xrange(M):
        p = (d*p + ord(pat[i]))%q
        t = (d*t + ord(txt[i]))%q
 
    # Slide the pattern over text one by one
    for i in xrange(N-M+1):
        # Check the hash values of current window of text and
        # pattern if the hash values match then only check
        # for characters on by one
        if p==t:
            # Check for characters one by one
            for j in xrange(M):
                if txt[i+j] != pat[j]:
                    break
                else: j+=1
 
            # if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]
            if j==M:
                print "Pattern found at index " + str(i)
 
        # Calculate hash value for next window of text: Remove
        # leading digit, add trailing digit
        if i < N-M:
            t = (d*(t-ord(txt[i])*h) + ord(txt[i+M]))%q
 
            # We might get negative values of t, converting it to
            # positive
            if t < 0:
                t = t+q
 
# Driver Code
txt = "GEEKS FOR GEEKS"
pat = "GEEK"
 
# A prime number
q = 101
 
# Function Call
search(pat,txt,q)
 
# This code is contributed by Bhavya Jain

// Following program is a C# implementation
// of Rabin Karp Algorithm given in the CLRS book
using System;
public class GFG
{
    // d is the number of characters in the input alphabet
    public readonly static int d = 256;
     
    /* pat -> pattern
        txt -> text
        q -> A prime number
    */
    static void search(String pat, String txt, int q)
    {
        int M = pat.Length;
        int N = txt.Length;
        int i, j;
        int p = 0; // hash value for pattern
        int t = 0; // hash value for txt
        int h = 1;
     
        // The value of h would be "pow(d, M-1)%q"
        for (i = 0; i < M-1; i++)
            h = (h*d)%q;
     
        // Calculate the hash value of pattern and first
        // window of text
        for (i = 0; i < M; i++)
        {
            p = (d*p + pat[i])%q;
            t = (d*t + txt[i])%q;
        }
     
        // Slide the pattern over text one by one
        for (i = 0; i <= N - M; i++)
        {
     
            // Check the hash values of current window of text
            // and pattern. If the hash values match then only
            // check for characters on by one
            if ( p == t )
            {
                /* Check for characters one by one */
                for (j = 0; j < M; j++)
                {
                    if (txt[i+j] != pat[j])
                        break;
                }
     
                // if p == t and pat[0...M-1] = txt[i, i+1, ...i+M-1]
                if (j == M)
                    Console.WriteLine("Pattern found at index " + i);
            }
     
            // Calculate hash value for next window of text: Remove
            // leading digit, add trailing digit
            if ( i < N-M )
            {
                t = (d*(t - txt[i]*h) + txt[i+M])%q;
     
                // We might get negative value of t, converting it
                // to positive
                if (t < 0)
                t = (t + q);
            }
        }
    }
     
    /* Driver Code */
    public static void Main()
    {
        String txt = "GEEKS FOR GEEKS";
        String pat = "GEEK";
       
          // A prime number
        int q = 101;
       
          // Function Call
        search(pat, txt, q);
    }
}
 
// This code is contributed by PrinciRaj19992

<?php
// Following program is a PHP
// implementation of Rabin Karp
// Algorithm given in the CLRS book
 
// d is the number of characters
// in the input alphabet
$d = 256;
 
/* pat -> pattern
   txt -> text
   q -> A prime number
*/
function search($pat, $txt, $q)
{
    $M = strlen($pat);
    $N = strlen($txt);
    $i; $j;
    $p = 0; // hash value
            // for pattern
    $t = 0; // hash value
            // for txt
    $h = 1;
    $d =1;
 
    // The value of h would
    // be "pow(d, M-1)%q"
    for ($i = 0; $i < $M - 1; $i++)
        $h = ($h * $d) % $q;
 
    // Calculate the hash value
    // of pattern and first
    // window of text
    for ($i = 0; $i < $M; $i++)
    {
        $p = ($d * $p + $pat[$i]) % $q;
        $t = ($d * $t + $txt[$i]) % $q;
    }
 
    // Slide the pattern over
    // text one by one
    for ($i = 0; $i <= $N - $M; $i++)
    {
 
        // Check the hash values of
        // current window of text
        // and pattern. If the hash
        // values match then only
        // check for characters on
        // by one
        if ($p == $t)
        {
            // Check for characters
            // one by one
            for ($j = 0; $j < $M; $j++)
            {
                if ($txt[$i + $j] != $pat[$j])
                    break;
            }
 
            // if p == t and pat[0...M-1] =
            // txt[i, i+1, ...i+M-1]
            if ($j == $M)
                echo "Pattern found at index ",
                                      $i, "\n";
        }
 
        // Calculate hash value for
        // next window of text:
        // Remove leading digit,
        // add trailing digit
        if ($i < $N - $M)
        {
            $t = ($d * ($t - $txt[$i] *
                        $h) + $txt[$i +
                             $M]) % $q;
 
            // We might get negative
            // value of t, converting
            // it to positive
            if ($t < 0)
            $t = ($t + $q);
        }
    }
}
 
// Driver Code
$txt = "GEEKS FOR GEEKS";
$pat = "GEEK";
 
// A prime number
$q = 101;
 
// Function Call
search($pat, $txt, $q);
 
// This code is contributed
// by ajit
?>

<script>
 
// Following program is a Javascript implementation
// of Rabin Karp Algorithm given in the CLRS book
 
// d is the number of characters in
// the input alphabet
let d = 256;
 
/* pat -> pattern
    txt -> text
    q -> A prime number
*/
function search(pat, txt, q)
{
    let M = pat.length;
    let N = txt.length;
    let i, j;
     
    // Hash value for pattern
    let p = 0;
     
    // Hash value for txt
    let t = 0;
    let h = 1;
 
    // The value of h would be "pow(d, M-1)%q"
    for(i = 0; i < M - 1; i++)
        h = (h * d) % q;
 
    // Calculate the hash value of pattern and
    // first window of text
    for(i = 0; i < M; i++)
    {
        p = (d * p + pat[i].charCodeAt()) % q;
        t = (d * t + txt[i].charCodeAt()) % q;
    }
 
    // Slide the pattern over text one by one
    for(i = 0; i <= N - M; i++)
    {
 
        // Check the hash values of current
        // window of text and pattern. If the
        // hash values match then only
        // check for characters on by one
        if (p == t)
        {
             
            /* Check for characters one by one */
            for(j = 0; j < M; j++)
            {
                if (txt[i+j] != pat[j])
                    break;
            }
 
            // if p == t and pat[0...M-1] =
            // txt[i, i+1, ...i+M-1]
            if (j == M)
                 document.write("Pattern found at index " +
                                i + "<br/>");
        }
 
        // Calculate hash value for next window
        // of text: Remove leading digit, add
        // trailing digit
        if (i < N - M)
        {
            t = (d * (t - txt[i].charCodeAt() * h) +
                      txt[i + M].charCodeAt()) % q;
 
            // We might get negative value of t,
            // converting it to positive
            if (t < 0)
                t = (t + q);
        }
    }
}
 
// Driver code
let txt = "GEEKS FOR GEEKS";
let pat = "GEEK";
 
// A prime number
let q = 101;
 
// Function Call
search(pat, txt, q);
 
// This code is contributed by target_2
 
</script>

Saída:

Pattern found at index 0
Pattern found at index 10

Complexidade de tempo:
O tempo de execução médio e de melhor caso do algoritmo Rabin-Karp é O (n + m), mas seu tempo de pior caso é O (nm). O pior caso do algoritmo Rabin-Karp ocorre quando todos os caracteres de padrão e texto são iguais aos valores de hash de todas as substrings de txt [] que correspondem ao valor de hash de pat []. Por exemplo pat [] = “AAA” e txt [] = “AAAAAAA”.

? list = PLqM7alHXFySGwXaessYMemAnITqlZdZVE

Escreva comentários se encontrar algo incorreto ou se quiser compartilhar mais informações sobre o tópico discutido acima.
Referências:
http://net.pku.edu.cn/~course/cs101/2007/resource/Intro2Algorithm/book6/chap34.htm
http://www.cs.princeton.edu/courses/archive/fall04/cos226/ lectures / string.4up.pdf
http://en.wikipedia.org/wiki/Rabin-Karp_string_search_algorithm
Postagens relacionadas:
Pesquisando padrões | Conjunto 1 (pesquisa de padrão ingênuo)
Procurando padrões | Conjunto 2 (algoritmo KMP)

Acervo Lima

O maior acervo de tutoriais e referências

Algoritmo Rabin-Karp para pesquisa de padrões

Latest posts

Latest posts

Most popular posts