PHP regexp replace word(s) in html string if not inside tags

The problem, was to find and replace text inside HTML (without breaking the HTML), take for example this example string:

<img title=”My image” alt=”My image” src=”/gfx/this is my image.gif”><p>This is my string</p>

and you want to replace the string “my” to another string or to enclose it inside another tag (let’s assume <strong></strong>), but only the “my” outside the html tags. So after the transformation it would look like:

<img title=”My image” alt=”My image” src=”/gfx/this is my image.gif”><p>This is <strong>my</strong> string</p>

With PHP Regular Expression functions, the typical solution find and replace with word boundary fails here.

preg_replace('/\b(my)\b/i',
             '<strong>$1</strong>',
             $html_string);

you will end up with messed up html

<img title=”<strong>My</strong> image” alt=”<strong>My</strong> image” src=”/gfx/this is <strong>my</strong> image.gif”><p>This is <strong>my</strong> string</p>

now think the wonderful mess that would be if you are replacing the words like “form” or “alt” that can be a text node, a html tag or attribute….

So how to fix this? I figured that the only common thing to all tags is the open and close character, the < and >, from here you simply search the word you want to replace and the next close tag char (the > sign), and within the matched result, you try to find a open tag char, if you don’t find an open tag you are within a tag, so you abort the replace. Here it is the code:

function checkOpenTag($matches) {
    if (strpos($matches[0], '<') === false) {
        return $matches[0];
    } else {
        return '<strong>'.$matches[1].'</strong>'.$matches[2];
    }
}

preg_replace_callback('/(\bmy\b)(.*?>)/i',
                      'checkOpenTag',
                      $html_string);

If you are going to use this kind of code to implement several words search in a HTML text (ex: a glossary implementation) test for performance and do think about a caching system.

That’s it, remember as this solution worked fine for me, it also can work terribly bad for you so proceed at your own risk (aka liability disclaimer).

UPDATE 19-04-14
There was a comment about this post that warms about only the first occurrence being replaced in an HTML segment. So, there is an updated version of the PHP example with this issue corrected:

<?

class replaceIfNotInsideTags {

  private function checkOpenTag($matches) {
    if (strpos($matches[0], '<') === false) {
      return $matches[0];
    } else {
      return '<strong>'.$matches[1].'</strong>'.$this->doReplace($matches[2]);
    }
  }

  private function doReplace($html) {
    return preg_replace_callback('/(\b'.$this->word.'\b)(.*?>)/i',
                                 array(&$this, 'checkOpenTag'),
                                 $html);
  }

  public function replace($html, $word) {
    $this->word = $word;

    return $this->doReplace($html);
  }
}

$html = '<p>my bird is my life is my dream</p>';

$obj = new replaceIfNotInsideTags();
echo $obj->replace($html, 'my');

?>

Lisbon Half Marathon

Finally a sub 2 hour half marathon, exactly 1h57m53s, with a lot of mixed feelings.

The first couple kilomoters, simply impossible to run properly, just a big gymkhana with all kind of non-runners in the way,  from the baby stroller to the old ladys walking hand in hand to avoid getting lost from each others, and a lot more characters in the middle…. the second third of the race did a very good time, with many sub 5 kilometers, probably sub 50m 10ks (couldn’t get all the splits), but just dropped the hammer a bit too soon, so the last 4Ks were hard, stopped and walked a couple of times, maybe the mental working due to the strong pace that somewhere started to seem harder than what i was ready/mentalized to coup with.

Except the feet blisters (as usual….), everything ok at the finish, legs, knees, muscles. Cool. By now, the sub 1h50 seems really doable, without no major change in training or life style.

Anyway, probably not in this race, for sure that is a very scenic and fun course, and the weather usually is fine this time of the year (today a bit too warm though). But have to rethink it next year, 45 minutes in a line to get the bib-numbers, they ran out of time control chips ?? (so no official time for me and others), in the race day, 45 minutes to walk/crawl 500 meters from the train station to the race start, first kilometers you don’t run you gymkhana, near Praça do Comercio gymkhana again, missed 3 aid station water supply due to all the confusion, at the end another 30 minutes just to pass through to the exit…..

Next running objectives:
sub 50m 10K
sub 5m 1500m (that is very hard)

Mysql split column string into rows

A MySQL recipe, that you can use to split a cell value by a known separator into different rows, in some way similar to the PHP explode function or split in PERL.

To turn this:

id value
1 4,5,7
2 4,5
3 4,5,6
…. ….

Into this

id value
1 4
1 5
1 7
2 4
2 5
3 4
3 5
3 6
…. ….

You can simply write and call a stored procedure

DELIMITER $$

DROP PROCEDURE IF EXISTS explode_table $$
CREATE PROCEDURE explode_table(bound VARCHAR(255))

  BEGIN

    DECLARE id INT DEFAULT 0;
    DECLARE value TEXT;
    DECLARE occurance INT DEFAULT 0;
    DECLARE i INT DEFAULT 0;
    DECLARE splitted_value INT;
    DECLARE done INT DEFAULT 0;
    DECLARE cur1 CURSOR FOR SELECT table1.id, table1.value
                                         FROM table1
                                         WHERE table1.value != '';
    DECLARE CONTINUE HANDLER FOR NOT FOUND SET done = 1;

    DROP TEMPORARY TABLE IF EXISTS table2;
    CREATE TEMPORARY TABLE table2(
    `id` INT NOT NULL,
    `value` VARCHAR(255) NOT NULL
    ) ENGINE=Memory;

    OPEN cur1;
      read_loop: LOOP
        FETCH cur1 INTO id, value;
        IF done THEN
          LEAVE read_loop;
        END IF;

        SET occurance = (SELECT LENGTH(value)
                                 - LENGTH(REPLACE(value, bound, ''))
                                 +1);
        SET i=1;
        WHILE i <= occurance DO
          SET splitted_value =
          (SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(value, bound, i),
          LENGTH(SUBSTRING_INDEX(value, bound, i - 1)) + 1), ',', ''));

          INSERT INTO table2 VALUES (id, splitted_value);
          SET i = i + 1;

        END WHILE;
      END LOOP;

      SELECT * FROM table2;
    CLOSE cur1;
  END; $$

Then you simply call it

CALL explode_table(',');
There it is the bare bones. From here it’s simple to adapt and build to your own needs, like adding some kind of filter parameter, order, etc… if your main interface to Mysql is PHPMyAdmin (as of now) forget it, its rubish with this procedures queries, you can use own MySQL GUI – MySQL Workbench – to interface with, or rely on the old CLI ‘mysql’ command, just put the stored procedure definition in a file and load it with a redirect:

mysql -u username -p -D databasename < procedure_definition_file.txt

Also remember:

  • if backups are made with mysqldump, use the –routines switch so the stored procedure definition goes in the dumps.
  • works mysql >= 5.0 only
  • performance, normalization and concurrency – this is not the correct way to do a many to many relationship with a RDBS, you should use a relationship table, and joins to work with it.
  • OK, so your project manager/marketing/boss changed the game rules at the very last moment, and to implement it correctly you must rework a lot of code, i understand 🙂 but even then enter this road at your own peril.

Rework by 37 Signals

Like Gordon Gekko once said “Because everyone is drinking the same Kool Aid“, and just because everybody in the business (the web business i mean) is drinking “Rework” by 37 Signals, i also drinked it too…. so what’s my taste of this book?

It’s a complex taste book, not because it digs deep the rabbit-hole, but because it (tries to) speak all things about the business universe, it goes from (unordered list) planning, to meetings, to time management, customer management, task prioritization, hiring and firing, office policies, marketing, product building, product minimalism, workaholism, by-products, productivity, startups, etc, etc….

It’s filled with common sense (is not so common) and Lapalissades, witch makes one feel smart:

«Failure is not a prerequisite of success.» – I knew that

«Forgoing sleep is a bad ideia.» – I also knew that

«Other people’s failures are just that: other people’s failures» – Duhh

«Revenue in, expenses out, Turn a profit or wind up gone.» – Heck, even the tavern owner where i go for cheap drinks knows this

«If you want to get someones attention, it’s silly to do exactly the same thing as everyone else.» –  I rest my case

But in the other hand, you are always making reality checks, comparing your own practices with the ones described in the book, and this review is obviously good.

Anyway, the work smart not hard philosophy makes sense, there are some good marketing tips, i really liked the teach and spread your secrets of the trade approach. It also makes a strong point about minimalistic products, those products that you strip down to the core, make them easier, cheaper, maintainable. Likewise, they don’t like guys throwed in suits (about#hate) and useless meetings (about#hate) ….. ahhhh…that was good for my ego.

The final balance is positive, everyone can get good ideias out of it, but is not the “fabulous”, “best book in my life and afterlife” hype that you read in Amazon reviews.

Here some of my favorite quotes:

«When you treat people like children, you get children’s work»

«And when everything is high priority, nothing is»

«Business are usually paranoid and secretive. They think they have proprietary this and competitive advantage that. Maybe a rare few do, but most don’t»

«Having the idea for eBay has nothing to do with actually creating eBay»

«The worst interruptions of all are meetings»

«How long someone’s been doing it is overrated. What matters is how well they’ve been doing it»