Remove XML Tag Blocks from the command line with sed

By Dave Ceddia

I had an xml file that looked something like this, and I wanted to remove all the <meta> tags from it:

<xml>
  <note>
    <to>A</to>
    <from>B</from>
    <meta>
      junk
    </meta>
    <meta>
      more junk
    </meta>
    <body>
      keep this
    </body>
  </note>
  ...
</xml>

The sed utility made quick work of it.

Some caveats: The file was already well-formatted, and these meta tags spanned multiple lines.

If your file is a jumbled mess, you might want to format it with prettier first.

Manipulating XML or HTML with tools like sed is not generally a great idea. For a general-purpose solution that can deal with all valid XML syntax you’d need a proper XML parser. But if your file is in the right shape, sed can be a quick and dirty way to get the job done.

Here’s the command I ran:

sed -i '' -e '/<meta>/,/<\/meta>/d' my-file.xml

The -i means “in-place”. It will change the file on disk. The '' is the name of the backup file – none, in this case. The Mac version of sed requires this name, though. If you’re on another system you might not need this.

The -e says to execute the regular expression that follows.

Let’s break down the expression: /<meta>/,/<\/meta>/d

The comma in the middle tells sed to look for a range of lines, and on either side of the comma is a regex. The d at the end means “delete this range”. Read about ranges in sed for more stuff you can do with them.

So we’re looking for lines starting with <meta> and ending with </meta>, and the slash needs to be escaped in the second regex, so we have /<\/meta>/.