Maybe somebody will take the time to figure out and post on Piazza some loop-less Java regular expression code to break a line apart and keep all the pieces but until then, here's a simple-minded iterative method that works:
private static ArrayListsplit(String s) { Matcher m = Pattern.compile("[A-Za-z]+|[^A-Za-z]+").matcher(s); ArrayList result = new ArrayList (); while (m.find()) { result.add(s.substring(m.start(), m.end())); } return result; }
The splitting is simpler in Python:
>>> re.split("([A-Za-z]+)", "To be or not to be.") ['', 'To', ' ', 'be', ' ', 'or', ' ', 'not', ' ', 'to', ' ', 'be', '.']
If you already know PHP and plan to use it, here's a split for you, too:
php > print_r(preg_split("/([A-Za-z]+)/", "To be or not to be.", -1, PREG_SPLIT_DELIM_CAPTURE)); Array ( [0] => [1] => To [2] => [3] => be [4] => [5] => or [6] => [7] => not [8] => [9] => to [10] => [11] => be [12] => . )