Lossy Run-Length Encoding - Microsoft Top Interview Questions
Problem Statement :
You are given a lowercase alphabet string s and an integer k. Consider an operation where we perform a run-length encoding on a string by representing repeated successive characters as a count and character. For example, the string "aabbbc" would be encoded as "2a3bc". Note that we don't put "1c" for "c" since it only appears once successively. Given that you can first remove any k consecutive characters in s, return the minimum length possible of the resulting run-length encoding. Constraints k ≤ n ≤ 100,000 where n is the length of s. Example 1 Input s = "aaaaabbaaaaaccaaa" k = 2 Output 6 Explanation The two obvious choices are to remove the "bb"s or the "cc"s. If we remove the "bb"s, then we'd get "10a2c3a" which has length of 7. If we remove the "cc"s, then we'd get "5a2b8a" which has length of 6.
Solution :
Solution in C++ :
struct state {
int lhs, rhs;
int total;
state() {
}
state(int a, int b) {
lhs = a;
rhs = b;
total = 0;
}
};
int lenof(int x) {
if (x == 1) return 0;
return to_string(x).size();
}
void update(state& prev, state& curr) {
curr.total = prev.total + lenof(curr.rhs - curr.lhs + 1) + 1;
}
class Solution {
public:
int solve(string s, int k) {
int n = s.size();
if (n == k) return 0;
vector<state> lhs;
lhs.emplace_back(-1, -1);
for (int i = 0; i < n - k; i++) {
if (i == 0 || s[i] != s[i - 1]) {
lhs.emplace_back(i, i);
} else {
lhs.back().rhs++;
}
update(lhs[lhs.size() - 2], lhs.back());
}
// initial estimate - delete the entire suffix
int ret = lhs.back().total;
vector<state> rhs;
rhs.emplace_back(n, n);
for (int i = n - k - 1; i >= 0; i--) {
// add the rightmost unadded character to the right side
int add = i + k;
if (add == n - 1 || s[add] != s[add + 1]) {
rhs.emplace_back(add, add);
} else {
rhs.back().lhs--;
}
update(rhs[rhs.size() - 2], rhs.back());
// remove the rightmost added character from the left
if (lhs.back().lhs == lhs.back().rhs) {
lhs.pop_back();
} else {
lhs.back().rhs--;
}
if (lhs.size() > 1) {
update(lhs[lhs.size() - 2], lhs.back());
}
// new naive estimate, just stick the two together
ret = min(ret, lhs.back().total + rhs.back().total);
// is it possible that the two ends can be stuck together?
if (lhs.size() > 1 && rhs.size() > 1 && s[lhs.back().rhs] == s[rhs.back().lhs]) {
// add together all the components that are not involved in the merge
int cand = lhs[lhs.size() - 2].total + rhs[rhs.size() - 2].total;
// recompute the compressed length
int tot = rhs.back().rhs - rhs.back().lhs + 1;
tot += lhs.back().rhs - lhs.back().lhs + 1;
cand += lenof(tot) + 1;
ret = min(ret, cand);
}
}
return ret;
}
};
int solve(string s, int k) {
return (new Solution())->solve(s, k);
}
Solution in Python :
class Solution:
def solve(self, S, K):
N = len(S)
if N == K:
return 0
left = [1] * N
for i in range(N - 1):
if S[i] == S[i + 1]:
left[i + 1] = left[i] + 1
right = [1] * N
for i in reversed(range(N - 1)):
if S[i] == S[i + 1]:
right[i] = right[i + 1] + 1
def rle(x):
return x if x <= 1 else len(str(x)) + 1
R = [len(list(g)) for _, g in groupby(S)]
prefix = [0] * N
prev = 0
i = 0
for x in R:
for j in range(1, 1 + x):
prefix[i] = prev + rle(j)
i += 1
prev += rle(x)
suffix = [0] * N
prev = 0
i = N - 1
for x in reversed(R):
for j in range(1, 1 + x):
suffix[i] = prev + rle(j)
i -= 1
prev += rle(x)
ans = min(prefix[~K], suffix[K])
for i in range(len(S) - K - 1):
cand = prefix[i] + suffix[i + K + 1]
lv = left[i]
rv = right[i + K + 1]
if S[i] == S[i + K + 1]:
cand -= rle(lv) + rle(rv)
cand += rle(lv + rv)
ans = min(ans, cand)
return ans
View More Similar Problems
Super Maximum Cost Queries
Victoria has a tree, T , consisting of N nodes numbered from 1 to N. Each edge from node Ui to Vi in tree T has an integer weight, Wi. Let's define the cost, C, of a path from some node X to some other node Y as the maximum weight ( W ) for any edge in the unique path from node X to Y node . Victoria wants your help processing Q queries on tree T, where each query contains 2 integers, L and
View Solution →Contacts
We're going to make our own Contacts application! The application must perform two types of operations: 1 . add name, where name is a string denoting a contact name. This must store name as a new contact in the application. find partial, where partial is a string denoting a partial name to search the application for. It must count the number of contacts starting partial with and print the co
View Solution →No Prefix Set
There is a given list of strings where each string contains only lowercase letters from a - j, inclusive. The set of strings is said to be a GOOD SET if no string is a prefix of another string. In this case, print GOOD SET. Otherwise, print BAD SET on the first line followed by the string being checked. Note If two strings are identical, they are prefixes of each other. Function Descriptio
View Solution →Cube Summation
You are given a 3-D Matrix in which each block contains 0 initially. The first block is defined by the coordinate (1,1,1) and the last block is defined by the coordinate (N,N,N). There are two types of queries. UPDATE x y z W updates the value of block (x,y,z) to W. QUERY x1 y1 z1 x2 y2 z2 calculates the sum of the value of blocks whose x coordinate is between x1 and x2 (inclusive), y coor
View Solution →Direct Connections
Enter-View ( EV ) is a linear, street-like country. By linear, we mean all the cities of the country are placed on a single straight line - the x -axis. Thus every city's position can be defined by a single coordinate, xi, the distance from the left borderline of the country. You can treat all cities as single points. Unfortunately, the dictator of telecommunication of EV (Mr. S. Treat Jr.) do
View Solution →Subsequence Weighting
A subsequence of a sequence is a sequence which is obtained by deleting zero or more elements from the sequence. You are given a sequence A in which every element is a pair of integers i.e A = [(a1, w1), (a2, w2),..., (aN, wN)]. For a subseqence B = [(b1, v1), (b2, v2), ...., (bM, vM)] of the given sequence : We call it increasing if for every i (1 <= i < M ) , bi < bi+1. Weight(B) =
View Solution →