venerdì 2 marzo 2012

Edit Distance Java - Similar String

The edit distance of two strings commonly refers to the Levenshtein Distance of two strings- the amount of work it takes to transform one string to the other by either inserting, deleting, or changing one character at a time.

For example, changing sitting -> kitten takes 3
moves: sitting -> kitting, kitting -> kittin, and finally kittin -> kitten.

The algorithm to solve this in reasonable time takes dynamic programming, and has many applications such as word checking, and typing speed tests. Algorithm below:


public static int editDistance(String s, String t){
    int m=s.length();
    int n=t.length();
    int[][]d=new int[m+1][n+1];
    for(int i=0;i<=m;i++){
      d[i][0]=i;
    }
    for(int j=0;j<=n;j++){
      d[0][j]=j;
    }
    for(int j=1;j<=n;j++){
      for(int i=1;i<=m;i++){
        if(s.charAt(i-1)==t.charAt(j-1)){
          d[i][j]=d[i-1][j-1];
        }
        else{
          d[i][j]=min((d[i-1][j]+1),(d[i][j-1]+1),(d[i-1][j-1]+1));
        }
      }
    }
    return(d[m][n]);
  }
  public static int min(int a,int b,int c){
    return(Math.min(Math.min(a,b),c));
  }

It is important to note that there are only three possibilities one string can get to another: either deleting, inserting, or changing.

Thus, if we let L(i,j) indicate the cost it takes from changing i to j, if s and t are the original strings, the cost is:

Math.min(1+L(i-1,j-1),1+L(i,j-1),1+L(i-1,j)) unless the last characters of the strings are the same, then the minimum is L(i-1,j-1). This solves the problem, but note that it is very slow because of all the recursive calls, which repeat many similiar problems.

Thus, we use dynamic programming: Construct a matrix d[m][n] with sizes s.length() and t.length(). instead of recursive calls, we will use a matrix lookup which will store the smallest current cost to change a smaller i into j.

Thus, with he java algorithm code for edit distance above, the speed is O(mn) where m and n are the sizes of the strings.

Nessun commento:

Posta un commento