Skip to content

Instantly share code, notes, and snippets.

@jdevalk
Created July 2, 2012 18:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jdevalk/3034833 to your computer and use it in GitHub Desktop.
Save jdevalk/3034833 to your computer and use it in GitHub Desktop.
Regex to match meta description in content
<?php
preg_match_all( '#<meta (name|content)="(.*)" (name|content)="(.*)"(\s+)?/?>#i', $content, $matches, PREG_SET_ORDER );
preg_match_all( "#<meta (name|content)='(.*)' (name|content)='(.*)'(\s+)?/?>#i", $content, $matches2, PREG_SET_ORDER );
@silasrm
Copy link

silasrm commented Jan 23, 2019

What about things like?
<meta http-equiv="Content-Type" content="text/html" >

Since we're specifically looking for the name/content for the meta description and (I'm assuming) keywords, I think catching content types is a bit much.

How about terrible terrible whitespaces?
< meta name="keywords" content = "wikipedia,encyclopedia" >

That all depends on how many spaces you want to filter for. But you could just as easily add \s* before and after any words. The following would catch the specific situation you posted (it already matched the missing / at the end):

preg_match_all( '#<\s*meta\s*(name|content)\s*=\s*("|')(.*)("|')\s*(name|content)\s*=\s*("|')(.*)("|')(\s+)?/?>#i', $content, $matches, PREG_SET_ORDER );

You need to scape simple quotes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment