Substring Diff


Problem Statement :


In this problem, we'll use the term "longest common substring" loosely. It refers to substrings differing at some number or fewer characters when compared index by index. For example, 'abc' and 'adc' differ in one position, 'aab' and 'aba' differ in two.

Given two strings and an integer k, determine the length of the longest common substrings of the two strings that differ in no more than k positions.

For example, k=1. Strings s1=abcd and s2=bbca. Check to see if the whole string (the longest substrings) matches. Given that neither the first nor last characters match and 2>k, we need to try shorter substrings. The next longest substrings are s1' = [abc, bcd] and s2' = [bbc, bca]. Two pairs of these substrings only differ in 1 position: [abc, bbc] and [bcd, bca]. They are of length 3.

Function Description

Complete the substringDiff function in the editor below. It should return an integer that represents the length of the longest common substring as defined.

substringDiff has the following parameter(s):

k: an integer that represents the maximum number of differing characters in a matching pair
s1: the first string
s2: the second string

Input Format

The first line of input contains a single integer, t, the number of test cases follow.
Each of the next t lines contains three space-separated values: an integer k and two strings, s1 and s2.

Constraints

1 <= t <= 10
0 <= k <= |s1|
|s1|  = |s2|
1 <= |s1|, |s2| <= 1500
All characters in s1 and s2 ∈ ascii[a-z].
Output Format

For each test case, output a single integer which is the length of the maximum length common substrings differing at k or fewer positions.



Solution :



title-img


                            Solution in C :

In C++ :






#include <iostream>
#include <cstdio>
#include <vector>
#include <map>
#include <queue>
#include <deque>
#include <stack>
#include <set>
#include <bitset>
#include <cmath>
#include <complex>
#include <algorithm>
#include <cstring>
#include <cstdlib>
#include <stdlib.h>
#include <utility>
#include <ctime>
using namespace std;

#define MOD 1000000007
#define BIT(x) __builtin_popcount(x)

int n , k;
int D[1505][1505],K[1505][1505];
char A[1505],B[1505];

int main()
{
	int t; cin >> t;
while(t--){
  scanf("%d",&k);
  scanf("%s%s",A,B);
  n = strlen(A);
  int r = 0;
  memset(D,0,sizeof(D));
  memset(K,0,sizeof(K));
  for(int i = n-1; i>=0 ; i--)
	for(int j = n-1; j>=0 ; j--){
     D[i][j] = D[i+1][j+1] + 1;
     K[i][j] = K[i+1][j+1] + ((A[i]==B[j])?0:1);
     while(K[i][j]>k){
    	 if(A[i+D[i][j]-1] != B[j+D[i][j]-1])
    		 K[i][j]--;
    	 D[i][j]--;
     }
     r = max(r, D[i][j]);
	}
  cout << r << endl;
}
  return 0;
}








In Java :






import java.awt.Point;
import java.io.*;
import java.math.BigInteger;
import java.util.*;
import static java.lang.Math.*;

public class Solution {

    BufferedReader in;
    PrintWriter out;
    StringTokenizer tok = new StringTokenizer("");

    public static void main(String[] args) {
        new Solution().run();
    }

    public void run() {
        try {
            long t1 = System.currentTimeMillis();
               in = new BufferedReader(new InputStreamReader(System.in));
                out = new PrintWriter(System.out);
           
            Locale.setDefault(Locale.US);
            solve();
            in.close();
            out.close();
            long t2 = System.currentTimeMillis();
            System.err.println("Time = " + (t2 - t1));
        } catch (Throwable t) {
            t.printStackTrace(System.err);
            System.exit(-1);
        }
    }

    String readString() throws IOException {
        while (!tok.hasMoreTokens()) {
            tok = new StringTokenizer(in.readLine());
        }
        return tok.nextToken();
    }

    int readInt() throws IOException {
        return Integer.parseInt(readString());
    }

    long readLong() throws IOException {
        return Long.parseLong(readString());
    }

    double readDouble() throws IOException {
        return Double.parseDouble(readString());
    }

    // solution
    void solve() throws IOException {
        int n = readInt();
        for (int i = 0; i < n; i++)
        {
            int k = readInt();
            String s1 = readString();
            String s2 = readString();
            out.println(Math.max(find(k, s1, s2), find(k, s2, s1)));
        }
    }
    
    int find(int k, String s1, String s2)
    {
        int max = 0;
        for (int startFrom = 0; startFrom < s1.length(); startFrom++)
        {
            int l = 0;
            int penalty = 0;
            for (int r = 0; (r < s2.length()) && (startFrom + r < s1.length()); r++)
            {
                if (s1.charAt(startFrom + r) != s2.charAt(r))
                    penalty++;
                while (penalty > k)
                {
                   if (s1.charAt(startFrom + l) != s2.charAt(l))
                       penalty--;
                   l++;
                }
                max = Math.max(max, r - l + 1);
            }
        }
        return max;
    }
}








In C :





#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAXN 1500

char buf[4096];
char diff[MAXN][MAXN];
int n, k;

void mkdiff(char *s, char *t) {
  int i, j;

  for (i = 0; i < n; i++)
    for (j = 0; j < n; j++)
      diff[i][j] = (s[i] == t[j]) ? 0 : 1;
}

int isgood(int L) {
  int d, i, j, sum;

  if (L <= k)
    return 1;

  for (d = -n+1; d <= n-1; d++) {
    if (d <= 0) {
      if (n + d < L)
	continue;
      sum = 0;
      for (i = -d, j = 0; i < n; i++, j++) {
	sum += diff[i][j];
	if (j >= L)
	  sum -= diff[i-L][j-L];
	if (j >= L-1 && sum <= k)
	  return 1;
      }
    } else {
      if (n - d < L)
	continue;
      sum = 0;
      for (i = 0, j = d; j < n; i++, j++) {
	sum += diff[i][j];
	if (i >= L)
	  sum -= diff[i-L][j-L];
	if (i >= L-1 && sum <= k)
	  return 1;
      }
    }
  }
  return 0;
}

/* binary search to find largest L which satisfies M(i,j,L) <= k */
int search() {
  int i, j, m;

  // Invariant: f(i-1)=true, f(j)=false
  i = 0;
  j = n+1;
  while (i < j) {
    m = (i + j) / 2;
    if (isgood(m))
      i = m+1;
    else
      j = m;
  }
  return i-1;
}

int main() {
  int ncases;
  char *s, *t;

  fgets(buf, sizeof buf, stdin);
  ncases = atoi(buf);

  while (ncases-- > 0) {
    fgets(buf, sizeof buf, stdin);
    s = strchr(buf, ' ');
    *s++ = '\0';
    t = strchr(s, ' ');
    *t++ = '\0';
    k = atoi(buf);
    n = strlen(s);
    if (t[n] == '\n')
      t[n] = '\0';

    mkdiff(s, t);

    printf("%d\n", search(0, n));
  }

  return 0;
}








In Python3 :





def l_func(p,q,max_s):
    n = len(q)
    res_ar = [0]
    count = 0
    ans = 0
    for i in range(n):
        if(p[i]!=q[i]):
            res_ar.append(i+1)
            count += 1
            if(count>max_s):
                l = res_ar[-1]-res_ar[-2-max_s]-1
                if(l>ans):ans = l
    if(count<=max_s):
        return n
    return ans

def check_func(p,q,s):
    n = len(q)
    ans = 0
    for i in range(n):
        if(n-i<=ans):break
        l = l_func(p,q[i:],s)
        if(l>ans):
            ans = l
    for i in range(n):
        if(n-i<=ans):break
        l = l_func(q,p[i:],s)
        if(l>ans):
            ans = l
    return ans
for case_t in range(int(input())):
    str_s,p,q = input().strip().split()
    s = int(str_s)
    print(check_func(p,q,s))
                        








View More Similar Problems

Merging Communities

People connect with each other in a social network. A connection between Person I and Person J is represented as . When two persons belonging to different communities connect, the net effect is the merger of both communities which I and J belongs to. At the beginning, there are N people representing N communities. Suppose person 1 and 2 connected and later 2 and 3 connected, then ,1 , 2 and 3 w

View Solution →

Components in a graph

There are 2 * N nodes in an undirected graph, and a number of edges connecting some nodes. In each edge, the first value will be between 1 and N, inclusive. The second node will be between N + 1 and , 2 * N inclusive. Given a list of edges, determine the size of the smallest and largest connected components that have or more nodes. A node can have any number of connections. The highest node valu

View Solution →

Kundu and Tree

Kundu is true tree lover. Tree is a connected graph having N vertices and N-1 edges. Today when he got a tree, he colored each edge with one of either red(r) or black(b) color. He is interested in knowing how many triplets(a,b,c) of vertices are there , such that, there is atleast one edge having red color on all the three paths i.e. from vertex a to b, vertex b to c and vertex c to a . Note that

View Solution →

Super Maximum Cost Queries

Victoria has a tree, T , consisting of N nodes numbered from 1 to N. Each edge from node Ui to Vi in tree T has an integer weight, Wi. Let's define the cost, C, of a path from some node X to some other node Y as the maximum weight ( W ) for any edge in the unique path from node X to Y node . Victoria wants your help processing Q queries on tree T, where each query contains 2 integers, L and

View Solution →

Contacts

We're going to make our own Contacts application! The application must perform two types of operations: 1 . add name, where name is a string denoting a contact name. This must store name as a new contact in the application. find partial, where partial is a string denoting a partial name to search the application for. It must count the number of contacts starting partial with and print the co

View Solution →

No Prefix Set

There is a given list of strings where each string contains only lowercase letters from a - j, inclusive. The set of strings is said to be a GOOD SET if no string is a prefix of another string. In this case, print GOOD SET. Otherwise, print BAD SET on the first line followed by the string being checked. Note If two strings are identical, they are prefixes of each other. Function Descriptio

View Solution →