I had an xml file that looked something like this, and I wanted to remove all the <meta>
tags from it:
<xml>
<note>
<to>A</to>
<from>B</from>
<meta>
junk
</meta>
<meta>
more junk
</meta>
<body>
keep this
</body>
</note>
...
</xml>
The sed
utility made quick work of it.
Some caveats: The file was already well-formatted, and these meta
tags spanned multiple lines.
If your file is a jumbled mess, you might want to format it with prettier first.
Manipulating XML or HTML with tools like sed is not generally a great idea. For a general-purpose solution that can deal with all valid XML syntax you’d need a proper XML parser. But if your file is in the right shape, sed can be a quick and dirty way to get the job done.
Here’s the command I ran:
sed -i '' -e '/<meta>/,/<\/meta>/d' my-file.xml
The -i
means “in-place”. It will change the file on disk. The ''
is the name of the backup file – none, in this case. The Mac version of sed
requires this name, though. If you’re on another system you might not need this.
The -e
says to execute the regular expression that follows.
Let’s break down the expression: /<meta>/,/<\/meta>/d
The comma in the middle tells sed to look for a range of lines, and on either side of the comma is a regex. The d
at the end means “delete this range”. Read about ranges in sed for more stuff you can do with them.
So we’re looking for lines starting with <meta>
and ending with </meta>
, and the slash needs to be escaped in the second regex, so we have /<\/meta>/
.