The uncertainty of software development bites again

A task I expected would take an hour has taken 10, with no end in sight.

Several months ago, I ran a small experiment on LinkedIn, which proved to boost engagement (and email signups) significantly.

In a nutshell, rather than posting a link to one of my daily messages (like this one), I’d post the full message contents. Then in the first comment, I’d put a link to the full article, or prompting people to sign up for my daily email list.

However, this involved a lot of manual work, because Zapier (which handles the auto posting for me) doesn’t give me any way to create a comment on a LinkedIn post.

Fast forward to Tuesday, and I discovered that Pipedream makes it easy to automatically add a comment to a LinkedIn post. Magic!

So with the proof-of-concept in my pocket, I set out to begin automating this.

The next hurdle to overcome: LinkedIn posts should contain plain text only. No HTML tags.

It’s easy enough to add a <content:plain> field in my RSS feed, that Pipedream can read. So I used Hugo’s plainify function to strip HTML tags. And that works… mostly.

Except that it often generates invalid XML, any time my text contains &, <, or > characters. Well, that’s nothing a little manual escaping can’t fix… done.

Only now… I’ve discovered that all paragraph breaks are missing.

<p>Good day!</p>
<p></p>

<p>How are you?</p>

gets rendered as:

Good day! How are you?

Bleh.

And what’s worse, lists are rendered unreadable.

<ul>
  <li>First item</li>
  <li>Second item</li>
</ul>

Becomes:

First item Second item

Okay. So I need something a bit smarter than just stripping HTML tags.

So what else does Hugo offer? Well, they support a wide variety of output formats, including text/plain. And they also have a nice RenderString function, which renders Markdown (or other input formats) into HTML. Hmm… can I modify this to support arbitrary output formats?

2 hours later, the answer is: Well, it’s complicated. It doesn’t actually output HTML, it outputs whatever the default format for that page is (which is often HTML, of course). So in principle, it should be possible to modify this code to output plain text… but there’s a lot of cache indirection and lazy loading in the way of making that path clear. Okay. Not a quick fix. What else can I try?

Maybe I can use pandoc? There’s a pandoc wrapper for Go. But, my gosh, what a confusing API it has. And running it in Pipedream’s serverless infrastructure is incredibly slow. Also, the output isn’t the best.

So I started searching for other HTML-to-Text conversion libraries in Go.

I found this one, but it produces some pretty ugly output for blockquotes. And I want more flexibility for how to handle header tags.

That’s when I noticed this package is a fork of another, which is much more recently maintained… github.com/jaytaylor/html2text

It still didn’t have quite the features I wanted, but why not fork it and add the features I want?

So that’s what I did… when I discovered the test suite fails. Blah. I submitted aPR to fix that, although it looks unlikely to be merged.

Then I started adding the changes, starting with blockquoting.

Half a day later, it’s still not working correctly. There’s got to be an easier way forward…

I don’t have the solution yet. But what I expected to be a quick fix obviously isn’t one. Neither is my plan B or C a quick fix.

What’s the point of all this? It’s the unknown unknowns, and the unknowable unknowns that always bite us.

Quick fixes are often not quick at all. If you had told me last week “Pipedream will let you automatically post comments on your LinkedIn posts. How long will it take you to set up that integration?” I would have said “I can have it done in an hour.”

Share this