Skip to content

Instantly share code, notes, and snippets.

@JaimeObregon
Created January 15, 2020 22:09
Show Gist options
  • Save JaimeObregon/db5e035fa44a114cbb75d445d519d9c3 to your computer and use it in GitHub Desktop.
Save JaimeObregon/db5e035fa44a114cbb75d445d519d9c3 to your computer and use it in GitHub Desktop.
Expresión regular que he utilizado para procesar el índice de telediarios de RTVE.
<?php
const REGEX = [
'<span class="col_tit" id=".+?" name="progname">',
'<a href="(?P<url>.+?)">',
'(<em>Nuevo<\/em>&nbsp;)?',
'(?P<title>.+?)',
'<\/a>',
'<\/span>',
'<span class="col_tip">',
'<span>(?P<type>.+?)<\/span>',
'<\/span>',
'<span class="col_dur">(?P<duration>.+?)<\/span>',
'<span class="col_pop">.+?<\/span>',
'<span class="col_fec">(?P<date>.+?)<\/span>',
'<div id=".+?" class="tultip hddn">',
'.*?',
'<span class="detalle">',
'(?P<description>.+?)',
'<\/span>',
'.*?',
'<\/div>',
'<\/li>',
];
$regex = sprintf('/%s/', implode('\s*', REGEX));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment