Examples of splitting with regular expressions

Java

Maybe somebody will take the time to figure out and post on Piazza some loop-less Java regular expression code to break a line apart and keep all the pieces but until then, here's a simple-minded iterative method that works:

    private static ArrayList split(String s) {
        Matcher m = Pattern.compile("[A-Za-z]+|[^A-Za-z]+").matcher(s);

        ArrayList result = new ArrayList(); 
        
        while (m.find()) {
            result.add(s.substring(m.start(), m.end()));
            }
        
        return result;
    }

Python

The splitting is simpler in Python:

    >>> re.split("([A-Za-z]+)", "To be or not to be.")
    ['', 'To', ' ', 'be', ' ', 'or', ' ', 'not', ' ', 'to', ' ', 'be', '.']

PHP

If you already know PHP and plan to use it, here's a split for you, too:

    php > print_r(preg_split("/([A-Za-z]+)/", "To be or not to be.", -1, PREG_SPLIT_DELIM_CAPTURE));
    Array
    (
        [0] => 
        [1] => To
        [2] =>  
        [3] => be
        [4] =>  
        [5] => or
        [6] =>  
        [7] => not
        [8] =>  
        [9] => to
        [10] =>  
        [11] => be
        [12] => .
    )