Thursday, December 15, 2016

Wrangling downloading my own blog articles

99.9% of everything at the moment appears to be wrangling basic data ingest.

In a fit of semi-directionless curiosity, I decided to try expanding on my previous post by wiring up some kind of automatic blogging tool just to see if I could. This problem has many aspects. Let the yak shaving begin.

The first thing I did was start thinking about requirements a bit. I didn't, you know, write any down. But I did think about it.

I decided I needed an AI bot which could automatically write blog posts for me. I assume many people have tried and failed, or possibly even tried and succeeded, but I didn't want to let knowledge get in the way of the outcome and just ran at it.

Here is my design:
  -- A module for downloading input source data to feed into the AI
  -- A module for running learning jobs based on source data
  -- A module to write blog articles
  -- A module to publish / print out the blog articles

I thought I'd wrangle some kind of thing which would create text of the required length using some kind of minimal implementation like a markov model trained on my old blog posts. I should have just trained it on gutenberg text or something, because it turns out that downloading my old blog posts is Harder Than It First Appears. Not really difficult, but longer than the hour or so I thought it would take.

There are basically two options: web scraping or the blogger API. The tool du jour for scraping in Python is called scrapy. I decided to use the blogger API, not because I thought it was inherently better than scraping, but rather because I expect I'll get a reasonably structured object in memory at the other end and won't have to do a lot of page interpretation. Also, if I want to publish directly to blogger later down the track, it will probably use the same mechanism.

Possibly because I'm bad at searching and terrible at web programming, this took me ages to get going. In fact, I haven't gotten going yet. I have merely jumped the first couple of an unknown number of hurdles. Programming is basically like playing an infinite scrolling computer game of randomly varying difficulty until you get to the end of the current level.

The plan was to create a minimal implementation which could then be bootstrapped into a smarter implementation. I thought people might like seeing it come together (I'd open source it, and then write about the process of building it here).

Here's where I've gotten:
  -- I've installed the requisite API
  -- I've discovered that I don't just need a 'project key', I actually a full Oauth2 secret and key
  -- I read the source code to figure out that the filename that is supplied into the sample tool method in the library is actually only used to derive a directory to look for the "client_secrets.json" file, so I can actually supply a fake filename in order to tell it which directory to look for the file in. It should have just asked for a directory.

So, instead of a really cool Markov model, instead I have gotten three lines into reproducing the online walkthrough. Only took a couple hours...

More later! Wish me luck...


Wednesday, December 14, 2016

A thought experiment on making money from blogging

For some unknown reason, Google picked today (or, presumably, a few days ago) to verify my postal email address. This is somehow part of being able to accept AdSense payments. I turned ads on at some point in my blogging life, probably about five years ago.

How much was waiting for me? Had I just won the lottery? Well, it looks like Google is willing to pay me $14.68 for my five years of highly varying blog posts.

The rest of this post is just my train of thought around whether it's actually possible to turn blogging into a sensible time investment based purely on basic text ads. If I scroll through my post history, it looks like a typical number of views, or impressions, or whatever, is two to three thousand per post (total ever). My posting history is pretty random, sometimes coming as part of a coherent and interesting series of posts, sometimes totally one-off, sometimes low-quality.

One of my best post early on was when I asked people not to read it to see if I could get a baseline for how many nonhuman readers I had (bots, crawlers etc etc) because I was curious how many real people viewed my blog. This, obviously, had humans clicking furiously to find out why they shouldn't be clicking it. I should have seen that one coming! However, this blog has been gaining in popularity despite my barely posting, and it no longer stands out. I expect the general topic areas I post in are gaining wider interest.

My best ever obtained 4080 views, and comprises a now out-of-day (although not badly so) install guide for some Python machine learning tools.

The adsense rates I see right now are 22c per thousand page views, or 15c per thousand impressions. I'm a bit unclear whether those can be added together or not (e.g. say 37c per thousand visitors) or if they are different ways of tracking the average monetisation of my posts, expressed as a rate per thousand. I think the latter. So, crudely, I think if I can get 1000 of "whatever blogger counts up against each post" I will earn 22c. My average post is then work 66c. (Australian dollars)

If I could increase my view rate tenfold, I could get $6.60 per post. A post can take me anywhere from 30 minutes to ten hours to produce, depending on how much technical work is needed to underpin it. I don't do this for the money, obviously, but rather just to write up and share my own work.

To me, I think my readership likely consists of people who know me, have seen me speak, or are discovering posts based on topic keywords (such as via search). I suspect I could achieve a tenfold increase if I followed some basic tactics and invested 3 hours a week into the blog. I think that would represent achieving a larger readership within my traditional demographic by increasing both awareness of my blog, and increasing its reputation. This would be a mix of increased views per post, and increased posts per week, but still for the baseline time investment of 3 hours a pop.

If I could increase the readership an additional tenfold (100 times current), then I could haul in the princely sum of $66.66 per week. As my significant other said, I should probably start by taking a packed lunch instead. Still, that wouldn't be nearly as interesting!!!

I suspect that to increase the second tenfold would require a demographic breakthrough, or a major reputational increase. I would probably need to start investing time in researching what people think is popular, and generally speaking work for my money rather than just posting whatever's on my mind. I suspect it would be possible. I can conceive of multiple posting tracks which would spread interest more widely, including:
  -- More compelling technical content
  -- Commentary and trend analysis on technology
  -- Summaries of news and recent events in technology
  -- Writing of short fiction and stories (I dabble. I'm not good, but I suspect it would still have a positive impact on the numbers)
  -- Posting of automatically-generated content from a home-grown AI (no, I haven't built one yet, but I suspect it could be done. Again, not done well, but enough to positively impact the numbers)

Those various tactics would probably be sufficiently self-interesting that (assuming I had the spare time), I would basically enjoy the process. By this stage, the nominal visitor count would be 300, 000 per view. That sounds like a lot to me. Perhaps enough to make me question whether my reasoning has been at all correct to this point.

I spent some time reading various "start a blog" articles to try to validate what I'm thinking. The numbers just don't seem to add up based on what I'm reading. Some people seem to struggle to get 300 views per post, which is probably right for a new blog on a niche topic. Some people say getting on the front page of a major news site nets them about 300k views. All the moneymaking advise it to avoid ads and focus on things like product sales and other avenues that sound like a lot more work than doing basically nothing other than posting.

What is the experience of others? What numbers are people seeing in terms of visitors currently? Should I focus on the number of views/impressions gathered in the first week of the post only? Is my current view rate per post high, medium, or low?

Is this post at all interesting to others?

If I could achieve a third tenfold increase up to 3, 000, 000 views per post, then I'd be starting to haul in an amount of money that would start to be motivating in its own right.  Right now, that seems like a ludicrous number to consider. At that kind of view rate, I would probably have to be substantially more concerned about what I actually said, just in case someone decided to listen to me or be seriously influenced by the content. Does anyone out there get 3m views per post without massive time investment?

I'm not convinced I have been totally clear in this post. However, in the interest of maximising my return-on-time-investment, I'm *not* going to go back and tidy up! It's been about 25 minutes of typing and thinking, and I'm worried about over-investing into that 66c return!