Tag Content Extractor Java


Problem Statement :


In a tag-based language like XML or HTML, contents are enclosed between a start tag and an end tag like <tag>contents</tag>. Note that the corresponding end tag starts with a /.

Given a string of text in a tag-based language, parse this text and retrieve the contents enclosed within sequences of well-organized tags meeting the following criterion:

1.The name of the start and end tags must be same. The HTML code <h1>Hello World</h2> is not valid, because the text starts with an h1 tag and ends with a non-matching h2 tag.
2.Tags can be nested, but content between nested tags is considered not valid. For example, in <h1><a>contents</a>invalid</h1>, contents is valid but invalid is not valid.
3.Tags can consist of any printable characters.

Input Format

The first line of input contains a single integer, N (the number of lines).
The N subsequent lines each contain a line of text.

Constraints

1<=N<=100
Each line contains a maximum of 10^4 printable characters.
The total number of characters in all test cases will not exceed 10^6.

Output Format

For each line, print the content enclosed within valid tags.
If a line contains multiple instances of valid content, print out each instance of valid content on a new line; if no valid content is found, print None.



Solution :



title-img


                            Solution in C :

import java.util.Scanner;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Solution
{
    public static void main(String[] args){

        String pattern = "<([^>]+)>([^<>]+)</\\1>";
        Pattern r = Pattern.compile(pattern);

        Scanner in = new Scanner(System.in);
        int testCases = Integer.parseInt(in.nextLine());
        while(testCases > 0){
            String line = in.nextLine();

            Matcher m = r.matcher(line);
            boolean findMatch = true;
            while(m.find( )){
                System.out.println(m.group(2));
                findMatch = false;
            } 
            if(findMatch) {
                System.out.println("None");
            }
            testCases--;
        }
    }
}
                        








View More Similar Problems

Square-Ten Tree

The square-ten tree decomposition of an array is defined as follows: The lowest () level of the square-ten tree consists of single array elements in their natural order. The level (starting from ) of the square-ten tree consists of subsequent array subsegments of length in their natural order. Thus, the level contains subsegments of length , the level contains subsegments of length , the

View Solution →

Balanced Forest

Greg has a tree of nodes containing integer data. He wants to insert a node with some non-zero integer value somewhere into the tree. His goal is to be able to cut two edges and have the values of each of the three new trees sum to the same amount. This is called a balanced forest. Being frugal, the data value he inserts should be minimal. Determine the minimal amount that a new node can have to a

View Solution →

Jenny's Subtrees

Jenny loves experimenting with trees. Her favorite tree has n nodes connected by n - 1 edges, and each edge is ` unit in length. She wants to cut a subtree (i.e., a connected part of the original tree) of radius r from this tree by performing the following two steps: 1. Choose a node, x , from the tree. 2. Cut a subtree consisting of all nodes which are not further than r units from node x .

View Solution →

Tree Coordinates

We consider metric space to be a pair, , where is a set and such that the following conditions hold: where is the distance between points and . Let's define the product of two metric spaces, , to be such that: , where , . So, it follows logically that is also a metric space. We then define squared metric space, , to be the product of a metric space multiplied with itself: . For

View Solution →

Array Pairs

Consider an array of n integers, A = [ a1, a2, . . . . an] . Find and print the total number of (i , j) pairs such that ai * aj <= max(ai, ai+1, . . . aj) where i < j. Input Format The first line contains an integer, n , denoting the number of elements in the array. The second line consists of n space-separated integers describing the respective values of a1, a2 , . . . an .

View Solution →

Self Balancing Tree

An AVL tree (Georgy Adelson-Velsky and Landis' tree, named after the inventors) is a self-balancing binary search tree. In an AVL tree, the heights of the two child subtrees of any node differ by at most one; if at any time they differ by more than one, rebalancing is done to restore this property. We define balance factor for each node as : balanceFactor = height(left subtree) - height(righ

View Solution →