Welcome to So You Wanna Learn Regex? Part 6.
OK, I know I said part 5 would be the last part in the series, but I just had to work this one out and wanted to share. Remember, If you want more tutorials about regex, especially more advanced ones than the mickey mouse onces here, go bug Ben. He knows more about this than I ever will and I hear he has a blog…
In our last exercise, we looked at cleaning up some data scripts.
In this exercise, we are going to reformat a configuration file from .ini style to ColdSpring MapFactory style. Specifically, I’m integrating CFFormProtect into an application and I want the config to be managed in ColdSpring with the rest of my configurations. Sure, I could go flapping around with copy+paste, smashing keys, burning tendons, but that seems so Junior Programmerish, doesn’t it?
Assume this set of declarations:
mouseMovement=1 usedKeyboard=1 timedFormSubmission=1 hiddenFormField=1 akismet=0 tooManyUrls=1 teststrings=1 projectHoneyPot=0 timedFormMinSeconds=5 timedFormMaxSeconds=3600 encryptionKey=JacobMuns0n akismetAPIKey= akismetBlogURL= akismetFormNameField= akismetFormEmailField= akismetFormURLField= akismetFormBodyField= tooManyUrlsMaxUrls=6
What we want, is to turn:mouseMovement=1
into: <entry key=”mouseMovement”><value>1</value></entry>
Note we’ve split a string delimted by an equals sign into some XML nodes.
So as you know, we define this pattern in the gobbledegook of regular expressions. When read one chunk at a time, these actually make sense. We’ll go through the exercise, then look at why it worked.
In Eclipse, perform the following:
- Open a new file and paste the above set of declarations: (yes, the whole thing)
- Open the find dialogue (I use CTRL+F) and make sure the Regular Expression option is ticked
- Enter the following in the Find: Input(.*[^=])=(.*)
- Enter the following in the Replace: Input<entry key=”$1″><value>$2</value></entry>
- Press Find and make sure the pattern matches what we want
- Lastly, press Replace All
You Should Have This:
<entry key="mouseMovement"><value>1</value></entry> <entry key="usedKeyboard"><value>1</value></entry> <entry key="timedFormSubmission"><value>1</value></entry> <entry key="hiddenFormField"><value>1</value></entry> <entry key="akismet"><value>0</value></entry> <entry key="tooManyUrls"><value>1</value></entry> <entry key="teststrings"><value>1</value></entry> <entry key="projectHoneyPot"><value>0</value></entry> <entry key="timedFormMinSeconds"><value>5</value></entry> <entry key="timedFormMaxSeconds"><value>3600</value></entry> <entry key="encryptionKey"><value>JacobMuns0n</value></entry> <entry key="akismetAPIKey"><value></value></entry> <entry key="akismetBlogURL"><value></value></entry> <entry key="akismetFormNameField"><value></value></entry> <entry key="akismetFormEmailField"><value></value></entry> <entry key="akismetFormURLField"><value></value></entry> <entry key="akismetFormBodyField"><value></value></entry> <entry key="tooManyUrlsMaxUrls"><value>6</value></entry>
(if not, you missed a step. Look at the image and compare with what you have in your Find/Replace dialog. Make sure there is no extra whitespace in the find expression)
Blamo! The configuration data has changed from the *ini format to the ColdSpring XML format. Look at how much money you saved from having to ice down your wrists. Let’s decode the code, shall we?
Here is the find portion of the regular expression: (.*[^=])=(.*)
- () The first character chunk is surrounded by parenthesis. This means we’ll be defining an assignable group.
- .*[^=] Inside the first set of parenthesis are .* meaning all characters, then followed by [^=] which excludes an equals sign. So (.*[^=]) means starting at the beginning, give me an assignable group of all characters until you hit an equals sign.
- = Next we have the equals sign, because this is the boundary marking the second group to define.
- (.*) Next is a chunk surrounded by parenthesis. This means we’ll be defining another assignable group
- .* Inside the second set of parenthesis are .* meaning all characters. Since there is nothing else, we want everything up until the end of the line.
All of that defines the boundaries for a character walking regular expression gnome to start at the beginning of each line, grab a first group of characters before the equals sign, and a second group of characters after the equals sign and name them $1 and $2 for us.
Then in the Replace section, we used: <entry key=”$1″><value>$2</value></entry>
- This is just structured xml with the group numbers in the right places.
So in plain English, we asked the regular expression find/replace gnome to: Bust up each line into a pre-equals sign and post-equals sign groups and then put the content for each group inside the XML litteral.
I’m sure you can agree this was much easier than a copy/paste extravaganza… I hope you enjoyed this (extended) blog series on Regex. If you want more of these, go bug Ben Nadel… His brain is a mobius strip of interesting regular expression patterns…