Content Processing
All WordPress post content goes through Model/ContentProcessor.php before being rendered. The pipeline has four steps applied in sequence.
Pipeline Overview
Section titled “Pipeline Overview”Raw WP content (HTML string) -> Step 1: rewriteInternalLinks() -> Step 2: addLazyLoading() -> Step 3: stripShortcodes() -> Step 4: sanitizeDangerousHtml() -> Processed content (HTML string)Step 1: Link Rewriting
Section titled “Step 1: Link Rewriting”WordPress internal links (links pointing to the WordPress domain) are rewritten to use the Magento blog URL.
Before:
<a href="https://blog.example.com/my-post">Read more</a><a href="https://blog.example.com/2024/01/15/my-post">Old permalink</a>After:
<a href="/blog/my-post">Read more</a><a href="/blog/my-post">Old permalink</a>The regex matches href attributes pointing to the configured WordPress URL. Date-based permalink prefixes (YYYY/MM/DD/) are stripped from paths before rewriting.
Step 2: Lazy Loading
Section titled “Step 2: Lazy Loading”loading="lazy" is added to all <img> tags that do not already have a loading attribute.
Before:
<img src="photo.jpg" alt="Photo"><img src="hero.jpg" alt="Hero" loading="eager">After:
<img loading="lazy" src="photo.jpg" alt="Photo"><img src="hero.jpg" alt="Hero" loading="eager">The regex uses a negative lookahead to skip images that already have a loading attribute.
Step 3: Shortcode Stripping
Section titled “Step 3: Shortcode Stripping”Any WordPress shortcodes that WordPress did not expand into HTML are removed. This prevents text like [gallery ids="1,2,3"] or [/caption] appearing in the rendered output.
The pattern /\[\/?\w[^\]]*\]/ matches both opening ([foo]) and closing ([/foo]) shortcode syntax.
Step 4: HTML Sanitization
Section titled “Step 4: HTML Sanitization”Four categories of dangerous HTML constructs are removed:
| What | Pattern | Action |
|---|---|---|
| Script blocks | <script>...</script> | Remove entire block including content |
| Style blocks | <style>...</style> | Remove entire block including content |
| Event handlers | onclick=, onload=, etc. | Remove attribute |
| JS protocol | href="javascript:..." | Replace with href="#" |
This is a lightweight pass. If your use case requires stricter sanitization (e.g., user-submitted content), consider adding HTMLPurifier as an additional step after this pipeline.
Extending the Pipeline
Section titled “Extending the Pipeline”To add a custom processing step, create a plugin on ContentProcessor:
class MyContentPlugin{ public function afterProcess( ContentProcessor $subject, string $result ): string { // Add custom processing return str_replace('old-domain.com', 'new-domain.com', $result); }}