How to split a string by regex pattern in Perl?
Question
How to split a string by regex pattern in Perl?
Splitting a String by a Regex Pattern in Perl
In Perl, the split function is a versatile tool used to break a string into a list of substrings based on a specified pattern. When you want to split a string using a regular expression (regex) pattern rather than a fixed substring, split seamlessly accepts regexes, making it incredibly flexible.
Here's a quick refresher on the split syntax:
split /PATTERN/, EXPR, LIMIT
/PATTERN/: The regex pattern defining where to split.EXPR: The string to split. Defaults to$_if omitted.LIMIT: Optional integer, limiting the number of fields returned.
Using a Regex Pattern
The pattern can be any valid Perl regex, allowing you to split on single characters, character classes, or complex expressions. For example:
- Split on whitespace:
split /\s+/, $string - Split on commas or semicolons:
split /[;,]/, $string - Split on multiple delimiters or more complex patterns
Context & Sigils
Note that split returns a list in list context and a scalar in scalar context (scalar context returns the number of fields extracted). It's common to assign the result to an array to work with the fields individually:
@fields = split /PATTERN/, $string;
Remember that in regex, certain characters have special meaning. For example, to split on a dot ('.'), you need to escape it like /\./, because '.' matches any character.
Practical Example
The following fully runnable Perl script demonstrates splitting a string on one or more commas or whitespace characters. It prints each resulting field on a new line.
#!/usr/bin/perl
use strict;
use warnings;
my $string = "apple, banana,orange, grape,melon";
# Split on commas (optionally with spaces) or whitespace
my @fruits = split /,\s*|\s+/, $string;
print "Split fields:\n";
foreach my $fruit (@fruits) {
print "[$fruit]\n";
}
Explanation
- The pattern
/,\s*|\s+/uses alternation|to split on either a comma followed by optional spaces or one or more whitespace characters. - This approach captures many real-world delimiters where the separator might be a comma with varying spaces or just whitespace.
- Fields retain their characters without separators.
Common Pitfalls
- Forgetting to escape regex metacharacters. Example: splitting on a dot requires
/\./, not/./. - Using
splitwithout a pattern or with an empty string produces unexpected behavior. - Beware of trailing empty fields when the string ends with the pattern; use the
LIMITargument if needed. - Remember that
splitrespects regex rules like greedy matching, so complex patterns might split unexpectedly.
Version Notes
The use of split with regex has been stable and consistent since early Perl versions. Some newer features related to regex (like unicode property escapes) are available in Perl 5.10 and above but are not specific to split.
In summary, splitting by regex in Perl is straightforward and powerful, leveraging Perl's rich regex engine. Adjust the pattern to capture the precise separators you need, test your pattern carefully, and you'll have fine-grained control over string tokenization.
Verified Code
Executed in a sandbox to capture real output. • v5.34.1 • 4ms
(empty)(empty)Was this helpful?
Related Questions
- How to validate email addresses using Perl regex?
- How to use greedy vs non-greedy quantifiers in Perl regex?
- How to match start and end of string with ^ and $ in Perl?
- How to use alternation | in Perl regex?
- How to use the qr// operator to compile regex in Perl?
- How to match Unicode characters in Perl regex?