Lossy Run-Length Encoding - Microsoft Top Interview Questions


Problem Statement :


You are given a lowercase alphabet string s and an integer k. 

Consider an operation where we perform a run-length encoding on a string by representing repeated successive characters as a count and character. 

For example, the string "aabbbc" would be encoded as "2a3bc". Note that we don't put "1c" for "c" since it only appears once successively.

Given that you can first remove any k consecutive characters in s, return the minimum length possible of the resulting run-length encoding.

Constraints

k ≤ n ≤ 100,000 where n is the length of s.

Example 1

Input

s = "aaaaabbaaaaaccaaa"

k = 2

Output

6

Explanation

The two obvious choices are to remove the "bb"s or the "cc"s.

If we remove the "bb"s, then we'd get "10a2c3a" which has length of 7.

If we remove the "cc"s, then we'd get "5a2b8a" which has length of 6.



Solution :



title-img




                        Solution in C++ :

struct state {
    int lhs, rhs;
    int total;
    state() {
    }
    state(int a, int b) {
        lhs = a;
        rhs = b;
        total = 0;
    }
};
int lenof(int x) {
    if (x == 1) return 0;
    return to_string(x).size();
}
void update(state& prev, state& curr) {
    curr.total = prev.total + lenof(curr.rhs - curr.lhs + 1) + 1;
}
class Solution {
    public:
    int solve(string s, int k) {
        int n = s.size();
        if (n == k) return 0;
        vector<state> lhs;
        lhs.emplace_back(-1, -1);
        for (int i = 0; i < n - k; i++) {
            if (i == 0 || s[i] != s[i - 1]) {
                lhs.emplace_back(i, i);
            } else {
                lhs.back().rhs++;
            }
            update(lhs[lhs.size() - 2], lhs.back());
        }
        // initial estimate - delete the entire suffix
        int ret = lhs.back().total;
        vector<state> rhs;
        rhs.emplace_back(n, n);
        for (int i = n - k - 1; i >= 0; i--) {
            // add the rightmost unadded character to the right side
            int add = i + k;
            if (add == n - 1 || s[add] != s[add + 1]) {
                rhs.emplace_back(add, add);
            } else {
                rhs.back().lhs--;
            }
            update(rhs[rhs.size() - 2], rhs.back());
            // remove the rightmost added character from the left
            if (lhs.back().lhs == lhs.back().rhs) {
                lhs.pop_back();
            } else {
                lhs.back().rhs--;
            }
            if (lhs.size() > 1) {
                update(lhs[lhs.size() - 2], lhs.back());
            }
            // new naive estimate, just stick the two together
            ret = min(ret, lhs.back().total + rhs.back().total);
            // is it possible that the two ends can be stuck together?
            if (lhs.size() > 1 && rhs.size() > 1 && s[lhs.back().rhs] == s[rhs.back().lhs]) {
                // add together all the components that are not involved in the merge
                int cand = lhs[lhs.size() - 2].total + rhs[rhs.size() - 2].total;
                // recompute the compressed length
                int tot = rhs.back().rhs - rhs.back().lhs + 1;
                tot += lhs.back().rhs - lhs.back().lhs + 1;
                cand += lenof(tot) + 1;
                ret = min(ret, cand);
            }
        }
        return ret;
    }
};


int solve(string s, int k) {
    return (new Solution())->solve(s, k);
}
                    




                        Solution in Python : 
                            
class Solution:
    def solve(self, S, K):
        N = len(S)
        if N == K:
            return 0

        left = [1] * N
        for i in range(N - 1):
            if S[i] == S[i + 1]:
                left[i + 1] = left[i] + 1

        right = [1] * N
        for i in reversed(range(N - 1)):
            if S[i] == S[i + 1]:
                right[i] = right[i + 1] + 1

        def rle(x):
            return x if x <= 1 else len(str(x)) + 1

        R = [len(list(g)) for _, g in groupby(S)]

        prefix = [0] * N
        prev = 0
        i = 0
        for x in R:
            for j in range(1, 1 + x):
                prefix[i] = prev + rle(j)
                i += 1
            prev += rle(x)

        suffix = [0] * N
        prev = 0
        i = N - 1
        for x in reversed(R):
            for j in range(1, 1 + x):
                suffix[i] = prev + rle(j)
                i -= 1
            prev += rle(x)

        ans = min(prefix[~K], suffix[K])
        for i in range(len(S) - K - 1):
            cand = prefix[i] + suffix[i + K + 1]
            lv = left[i]
            rv = right[i + K + 1]
            if S[i] == S[i + K + 1]:
                cand -= rle(lv) + rle(rv)
                cand += rle(lv + rv)
            ans = min(ans, cand)
        return ans
                    


View More Similar Problems

Super Maximum Cost Queries

Victoria has a tree, T , consisting of N nodes numbered from 1 to N. Each edge from node Ui to Vi in tree T has an integer weight, Wi. Let's define the cost, C, of a path from some node X to some other node Y as the maximum weight ( W ) for any edge in the unique path from node X to Y node . Victoria wants your help processing Q queries on tree T, where each query contains 2 integers, L and

View Solution →

Contacts

We're going to make our own Contacts application! The application must perform two types of operations: 1 . add name, where name is a string denoting a contact name. This must store name as a new contact in the application. find partial, where partial is a string denoting a partial name to search the application for. It must count the number of contacts starting partial with and print the co

View Solution →

No Prefix Set

There is a given list of strings where each string contains only lowercase letters from a - j, inclusive. The set of strings is said to be a GOOD SET if no string is a prefix of another string. In this case, print GOOD SET. Otherwise, print BAD SET on the first line followed by the string being checked. Note If two strings are identical, they are prefixes of each other. Function Descriptio

View Solution →

Cube Summation

You are given a 3-D Matrix in which each block contains 0 initially. The first block is defined by the coordinate (1,1,1) and the last block is defined by the coordinate (N,N,N). There are two types of queries. UPDATE x y z W updates the value of block (x,y,z) to W. QUERY x1 y1 z1 x2 y2 z2 calculates the sum of the value of blocks whose x coordinate is between x1 and x2 (inclusive), y coor

View Solution →

Direct Connections

Enter-View ( EV ) is a linear, street-like country. By linear, we mean all the cities of the country are placed on a single straight line - the x -axis. Thus every city's position can be defined by a single coordinate, xi, the distance from the left borderline of the country. You can treat all cities as single points. Unfortunately, the dictator of telecommunication of EV (Mr. S. Treat Jr.) do

View Solution →

Subsequence Weighting

A subsequence of a sequence is a sequence which is obtained by deleting zero or more elements from the sequence. You are given a sequence A in which every element is a pair of integers i.e A = [(a1, w1), (a2, w2),..., (aN, wN)]. For a subseqence B = [(b1, v1), (b2, v2), ...., (bM, vM)] of the given sequence : We call it increasing if for every i (1 <= i < M ) , bi < bi+1. Weight(B) =

View Solution →