Created
January 15, 2020 22:09
-
-
Save JaimeObregon/db5e035fa44a114cbb75d445d519d9c3 to your computer and use it in GitHub Desktop.
Expresión regular que he utilizado para procesar el índice de telediarios de RTVE.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?php | |
const REGEX = [ | |
'<span class="col_tit" id=".+?" name="progname">', | |
'<a href="(?P<url>.+?)">', | |
'(<em>Nuevo<\/em> )?', | |
'(?P<title>.+?)', | |
'<\/a>', | |
'<\/span>', | |
'<span class="col_tip">', | |
'<span>(?P<type>.+?)<\/span>', | |
'<\/span>', | |
'<span class="col_dur">(?P<duration>.+?)<\/span>', | |
'<span class="col_pop">.+?<\/span>', | |
'<span class="col_fec">(?P<date>.+?)<\/span>', | |
'<div id=".+?" class="tultip hddn">', | |
'.*?', | |
'<span class="detalle">', | |
'(?P<description>.+?)', | |
'<\/span>', | |
'.*?', | |
'<\/div>', | |
'<\/li>', | |
]; | |
$regex = sprintf('/%s/', implode('\s*', REGEX)); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment