CloudKit – now that’s how to do a secure Database for users

Data Breach Hand Brick Wall Computer

One of the big controversies here relates to the appetite of the current UK government to release personal data with the most basic understanding of what constitutes personal identifiable information. The lessons are there in history, but I fear without knowing the context of the infamous AOL Data Leak, that we are destined to repeat it. With it goes personal information that we typically hold close to our chests, which may otherwise cause personal, social or (in the final analysis) financial prejudice.

When plans were first announced to release NHS records to third parties, and in the absence of what I thought were appropriate controls, I sought (with a heavy heart) to opt out of sharing my medical history with any third party – and instructed my GP accordingly. I’d gladly share everything with satisfactory controls in place (medical research is really important and should be encouraged), but I felt that insufficient care was being exercised. That said, we’re more than happy for my wife’s Genome to be stored in the USA by 23andMe – a company that demonstrably satisfied our privacy concerns.

It therefore came as quite a shock to find that a report, highlighting which third parties had already been granted access to health data with Government mandated approval, ran to a total 459 data releases to 160 organisations (last time I looked, that was 47 pages of PDF). See this and the associated PDFs on that page. Given the level of controls, I felt this was outrageous. Likewise the plans to release HMRC related personal financial data, again with soothing words from ministers in whom, given the NHS data implications, appear to have no empathy for the gross injustices likely to result from their actions.

The simple fact is that what constitutes individual identifiable information needs to be framed not only with what data fields are shared with a third party, but to know the resulting application of that data by the processing party. Not least if there is any suggestion that data is to be combined with other data sources, which could in turn triangulate back to make seemingly “anonymous” records traceable back to a specific individual.Which is precisely what happened in the AOL Data Leak example cited.

With that, and on a somewhat unrelated technical/programmer orientated journey, I set out to learn how Apple had architected it’s new CloudKit API announced this last week. This articulates the way in which applications running on your iPhone handset, iPad or Mac had a trusted way of accessing personal data stored (and synchronised between all of a users Apple devices) “in the Cloud”.

The central identifier that Apple associate with you, as a customer, is your Apple ID – typically an email address. In the Cloud, they give you access to two databases on their cloud infrastructure; one a public one, the other private. However, the second you try to create or access a table in either, the API accepts your iCloud identity and spits back a hash unique to your identity and the application on the iPhone asking to process that data. Different application, different hash. And everyone’s data is in there, so it’s immediately unable to permit any triangulation of disparate data that can trace back to uniquely identify a single user.

Apple take this one stage further, in that any application that asks for any personal identifiable data (like an email address, age, postcode, etc) from any table has to have access to that information specifically approved by the handset owners end user; no explicit permission (on a per application basis), no data.

The data maintained by Apple, besides holding personal information, health data (with HealthKit), details of home automation kit in your house (with HomeKit), and not least your credit card data stored to buy Music, Books and Apps, makes full use of this security model. And they’ve dogfooded it so that third party application providers use exactly the same model, and the same back end infrastructure. Which is also very, very inexpensive (data volumes go into Petabytes before you spend much money).

There are still some nuances I need to work. I’m used to SQL databases and to some NoSQL database structures (i’m MongoDB certified), but it’s not clear, based on looking at the way the database works, which engine is being used behind the scenes. It appears to be a key:value store with some garbage collection mechanics that look like a hybrid file system. It also has the capability to store “subscriptions”, so if specific criteria appear in the data store, specific messages can be dispatched to the users devices over the network automatically. Hence things like new diary appointments in a calendar can be synced across a users iPhone, iPad and Mac transparently, without the need for each to waste battery power polling the large database on the server waiting for events that are likely to arrive infrequently.

The final piece of the puzzle i’ve not worked out yet is, if you have a large database already (say of the calories, carbs, protein, fat and weights of thousands of foods in a nutrition database), how you’d get that loaded into an instance of the public database in Apple’s Cloud. Other that writing custom loading code of course!

That apart, really impressed how Apple have designed the datastore to ensure the security of users personal data, and to ensure an inability to triangulate data between information stored by different applications. And that if any personal identifiable data is requested by an application, that the user of the handset has to specifically authorise it’s disclosure for that application only. And without the app being able to sense if the data is actually present at all ahead of that release permission (so, for example, if a Health App wants to gain access to your blood sampling data, it doesn’t know if that data is even present or not before the permission is given – so the app can’t draw inferences on your probably having diabetes, which would be possible if it could deduce if it knew that you were recording glucose readings at all).

In summary, impressive design and a model that deserves our total respect. The more difficult job will be to get the same mindset in the folks looking to release our most personal data that we shared privately with our public sector servants. They owe us nothing less.

Officially Certified: AWS Business Professional

AWS Business Professional Certification

That’s added another badge, albeit the primary reason was to understand AWS’s products and services in order to suss how to build volumes via resellers for them – just in case I can get the opportunity to be asked how i’d do it. However, looking over the fence at some of the technical accreditation exams, I appear to know around half of the answers there already – but need to do those properly and take notes before attempting those.

(One of my old party tricks used to be that I could make it past the entrance exam required for entry into technical streams at Linux related conferences – a rare thing for a senior manager running large Software Business Operations or Product Marketing teams. Being an ex programmer who occasionally fiddles under the bonnet on modern development tools is a useful thing – not least to feed an ability to be able to spot bullshit from quite a distance).

The only AWS module I had any difficulty with was the pricing. One of the things most managers value is simplicity and predictability, but a lot of the pricing of core services have pricing dependencies where you need to know data sizes, I/O rates or the way your demand goes through peaks and troughs in order to arrive at an approximate monthly price. While most of the case studies amply demonstrate that you do make significant savings compared to running workloads on your own in-house infrastructure, I guess typical values for common use cases may be useful. For example, if i’m running a SAP installation of specific data and access dimensions, what operationally are typically running costs – without needing to insert probes all over a running example to estimate it using the provided calculator?

I’d come back from a 7am gym session fairly tired and made the mistake of stepping through the pricing slides without making copious notes. I duly did all that module again and did things properly the next time around – and passed it to complete my certification.

The lego bricks you snap together to design an application infrastructure are simple in principle, loosely connected and what Amazon have built is very impressive. The only thing not provided out of the box is the sort of simple developer bundle of an EC2 instance, some S3 and MySQL based EBD, plus some open source AMIs preconfigured to run WordPress, Joomla, Node.js, LAMP or similar – with a simple weekly automatic backup. That’s what Digital Ocean provide for a virtual machine instance, with specific storage and high Internet Transfer Out limits for a fixed price/month. In the case of the WordPress network on which my customers and this blog runs, that’s a 2-CPU server instance, 40GB of disk space and 4TB/month data traffic for $20/month all in. That sort of simplicity is why many startup developers have done an exit stage left from Rackspace and their ilk, and moved to Digital Ocean in their thousands; it’s predictable and good enough as an experimental sandpit.

The ceiling at AWS is much higher when the application slips into production – which is probably reason enough to put the development work there in the first place.

I have deployed an Amazon Workspace to complete my 12 years of Nutrition Data Analytics work using the Windows-only Tableau Desktop Professional – in an environment where I have no Windows PCs available to me. Just used it on my MacBook Air and on my iPad Mini to good effect. That will cost be just north of £21 ($35) for the month.

I think there’s a lot that can be done to accelerate adoption rates of AWS services in Enterprise IT shops, both in terms of direct engagement and with channels to market properly engaged. My real challenge is getting air time with anyone to show them how – and in the interim, getting some examples ready in case I can make it in to do so.

That said, I recommend the AWS training to anyone. There is some training made available the other side of applying to be a member of the Amazon Partner Network, but there are equally some great technical courses that anyone can take online. See http://aws.amazon.com/training/ for further details.

Help available to keep malicious users away from your good work

Picture of a Stack of Tins of Spam Meat

One thing that still routinely shocks me is the shear quantity of malicious activity that goes on behind the scenes of any web site i’ve put up. When we were building Internet Vulnerability Testing Services at BT, around 7 new exploits or attack vectors were emerging every 24 hours. Fortunately, for those of us who use Open Source software, the protections have usually been inherent in the good design of the code, and most (OpenSSL heartbleed excepted) have had no real impact with good planning. All starting with closing off ports, and restricting access to some key ones from only known fixed IP addresses (that’s the first thing I did when I first provisioned our servers in Digital Ocean Amsterdam – just surprised they don’t give a template for you to work from – fortunately I keep my own default rules to apply immediately).

With WordPress, it’s required an investment in a number of plugins to stem the tide. Basic ones like Comment Control, that  can lock down pages, posts, images and attachments from having comments added to them (by default, spammers paradise). Where you do allow comments, you install the WordPress provided Akismet, which at least classifies 99% of the SPAM attempts and sticks them in the spam folder straight away. For me, I choose to moderate any comment from someone i’ve not approved content from before, and am totally ruthless with any attempt at social engineering; the latter because if they post something successfully with approval a couple of times, their later comment spam with unwanted links get onto the web site immediately until I later notice and take them down. I prefer to never let them get to that stage in the first place.

I’ve been setting up a web site in our network for my daughter in law to allow her to blog abound Mental Health issues for Children, including ADHD, Aspergers and related afflictions. For that, I installed BuddyPress to give her user community a discussion forum, and went to bed knowing I hadn’t even put her domain name up – it was just another set of deep links into my WordPress network at the time.

By the morning, 4 user registrations, 3 of them with spoof addresses. Duly removed, and the ability to register usernames then turned off completely while I fix things. I’m going into install WP-FB-Connect to allow Facebook users to work on the site based on their Facebook login credentials, and to install WangGuard to stop the “Splogger” bots. That is free for us for the volume of usage we expect (and the commercial dimensions of the site – namely non-profit and charitable), and appears to do a great job  sharing data on who and where these attempts come from. Just got to check that turning these on doesn’t throw up a request to login if users touch any of the other sites in the WordPress network we run on our servers, whose user communities don’t need to logon at any time, at all.

Unfortunately, progress was rather slowed down over the weekend by a reviewer from Kenya who published a list of best 10 add-ins to BuddyPress, #1 of which was a Social Network login product that could authenticate with Facebook or Twitter. Lots of “Great Article, thanks” replies. In reality, it didn’t work with BuddyPress at all! Duly posted back to warn others, if indeed he lets that news of his incompetence in that instance back to his readers.

As it is, a lot of WordPress Plugins (there are circa 157 of them to do social site authentication alone) are of variable quality. I tend to judge them by the number of support requests received that have been resolved quickly in the previous few weeks – one nice feature of the plugin listings provided. I also have formal support contracts in with Cyberchimps (for some of their themes) and with WPMU Dev (for some of their excellent Multisite add-ons).

That aside, we now have the network running with all the right tools and things seem to be working reliably. I’ve just added all the page hooks for Google Analytics and Bing Web Tools to feed from, and all is okay at this stage. The only thing i’d like to invest in is something to watch all the various log files on the server and to give me notifications if anything awry is happening (like MySQL claiming an inability to connect to the WordPress database, or Apache spawning multiple instances and running out of memory – something I had in the early days when the Google bot was touching specific web pages, since fixed).

Just a shame that there are still so many malicious link spammers out there; they waste 30 minutes of my day every day just clearing their useless gunk out. But thank god that Google are now penalising these very effectively; long may that continue, and hopefully the realisation of the error of their ways will lead to being a more useful member of the worldwide community going forward.

Programming and my own sordid past

Austin Maestro LCP5

Someone asked me what sort of stuff i’ve programmed down my history. I don’t think i’ve ever documented it in one place, so i’m going the attempt a short summary here. I even saw that car while it was still in R&D at British Leyland! There are lots of other smaller hacks, but to give a flavour of the more sizable efforts. The end result is why I keep technically adept, even though most roles I have these days are more managerial in nature, where the main asset attainable is to be able to suss BS from a long distance.

Things like Excel, 1-2-3, Tableau Desktop Professional and latterly Google Fusion Tables are all IanW staples these days, but i’ve not counted these as real programming tools. Nor have I counted use of SQL commands to extract data from database tables directly from MySQL, or within Microsoft SQL Server Reporting Services (SSRS), which i’ve also picked up along the way. Ditto for the JavaScript based UI in front of MongoDB.

Outside of these, the projects have been as follows:

JOSS Language Interpreter (A Level Project: PAL-III Assembler). This was my tutors University project, a simple language consisting of onto 5 commands. Wrote the syntax checker and associated interpreter. Didn’t even have a “run” command; you just did a J 0 (Jump to Line Zero) to set it in motion.

Magic Square Solver (Focal-8). Managed to work out how to build a 4×4 magic square where every row, column, diagonals and centre four squares all added up to the same number. You could tap any number and it would work out the numbers for you and print it out.

Paper Tape Spooler (Basic Plus on RSTS/E). My first job at Digital (as trainee programmer) was running off the paper tape diagnostics my division shipped out with custom-built hardware options. At the time, paper tape was the universal data transfer medium for PDP-8 and PDP-11 computers. My code spooled multiple copies out, restarting from the beginning of the current copy automatically if the drive ran out of paper tape mid-way through. My code permitted the operator to input a message, which was printed out in 8×7 dot letter shapes using the 8 hole punch at the front of each tape – so the field service engineer could readily know what was on the tape.

Wirewrap Optimiser (Fortran-11 on RSX-11M). At the time my division of DEC was building custom circuit boards for customers to use on their PDP-8 and PDP-11 computers, extensive use was made of wire-wrapped backplanes into which the boards plugged into the associated OmniBus, UniBus or Q-Bus electronics. The Wirewrap program was adapted from a piece of public domain code to tell the operator (holding a wirewrap gun) which pins on a backplane to wire together and in what sequence. This was to nominally minimise the number of connections needed, and to make the end result as maintainable as possible (to avoid having too many layers of wires to unpick if a mistake was made during the build).

Budgeting Suite (Basic Plus on RSTS/E). Before we knew of this thing called a Spreadsheet (it was a year after Visicalc had first appeared on the Apple ][), I coded up a budget model for my division of DEC in Basic Plus. It was used to model the business as it migrated from doing individual custom hardware and software projects into one where we looked to routinely resell what we’d engineered to other customers. Used extensively by the Divisional board director that year to produce his budget.

Diagnostics (Far too many to mention, predominantly Macro-11 with the occasional piece of PAL-III PDP-8 Assembler, standalone code or adapted to run under DEC-X/11). After two years of pushing bits to device registers, and ensuring other bits changed in sync, it became a bit routine and I needed to get out. I needed to talk to customers … which I did on my next assignment, and then escaped to Digital Bristol.

VT31 Light Pen Driver in Macro-11 on RSX-11M. The VT31 was a bit mapped display and you could address every pixel on it individually. The guy who wrote the diagnostic code (Bob Grindley) managed to get it to draw circles using just increment and decrement instructions – no sign of any trig functions anywhere – which I thought was insanely neat. So neat, I got him to write it up on a flowchart which I still have in my files to this day. That apart, one of our OEM customers needed to fire actions off if someone pressed the pen button when the pen was pointing at a location somewhere on the screen. My RSX-11M driver responded to a $QIO request to feed back the button press event and the screen location it was pointing at when that occured, either directly, or handled as an Asynchronous System Trap (AST in PDP-11 parlance). Did the job, I think used in some aerospace radar related application.

Kongsberg Plotter Driver (Press Steel Fisher, Macro-11 on RSX-11M). Pressed Steel Fisher were the division of British Leyland in Cowley, Oxford who pressed the steel plates that made Austin and Morris branded car bodies. The Kongsberg Plotter drew full size stencils which were used to fabricate the car-size body panels; my code drove the pen on it from customers own code converted to run on a PDP-11. The main fascination personally was being walked through one workshop where a full size body of a yet announced car was sitting their complete. Called at that stage the LCP5, it was released a year later under the name of an Austin Maestro – the mid range big brother to the now largely forgotten Mini Metro.

Spanish Lottery Random Number Generator (De La Rue, Macro-11 on RSX-11M). De La Rue had a secure printing division that printed most of the cheque books used in the UK back in the 1980’s. They were contracted by the Spanish Lottery to provide a random number generator. I’m not sure if this was just to test things or if it was used for the real McCoy, but I was asked to provide one nonetheless. I wrote all the API code and unashamedly stole the well tested random generator code itself from the sources of single user, foreground/background only Operating System RT-11. A worked well, and the customer was happy with the result. I may have passed up the opportunity to become really wealthy in being so professional 🙂

VAX PC-11 Paper Tape Driver (Racal Redac, Thorn EMI Wookey Hole, others, Macro-32 on VAX/VMS). Someone from Educational Services had written a driver for the old PC11 8-hole Paper Tape Reader and Punch as an example driver. Unfortunately, if it ran out of paper tape when outputting the blank header or trailer (you had to leave enough blank tape either end to feed the reader properly), then the whole system crashed. Something of an inconvenience if it was supposed to be doing work for 100’s of other users at the same time. I cleaned up the code, fixed the bug and then added extra code to print a message on the header as i’d done earlier in my career. The result was used in several applications to drive printed circuit board, milling and other manufacturing machines which still used paper tape input at that stage.

Stealth Tester, VAX/VMS Space Invaders (British Aerospace, VAX Fortran on VAX/VMS). Not an official project, but one of our contacts at British Aerospace in Filton requested help fixing a number of bugs in his lunchtime project – to implement space invaders to work on VAX/VMS for any user on an attached VT100 terminal. The team (David Foddy, Bob Haycocks and Maurice Wilden) nearly got outed when pouring over a listing when the branch manager (Peter Shelton) walked into the office unexpectedly, though he left seemingly impressed by his employees working so hard to fix a problem with VAX Fortran “for BAE”. Unfortunately, I was the weak link a few days later; the same manager walked into the Computer Room when I was testing the debugged version, but before they’d added the code to escape quickly if the operator tapped control-C on the keyboard. When he looked over my shoulder after seeing me frantically trying to abort something, he was greeted by the Space Invaders Superleague, complete with the pseudonyms of all the testers onboard. Top of that list being Flash Gordon’s Granny (aka Maurice Wilden) and two belonging to Bob Haycocks (Gloria Stitz and Norma Snockers). Fortunately, he saw the funny side!

VMS TP Monitor Journal Restore (Birds Eye Walls, Macro-32 on VAX/VMS). We won an order to supply 17 VAX computers to Birds Eye Walls, nominally for their “Nixdorf Replacement Project”. The system was a TP Monitor that allowed hundreds of telesales agents to take orders for Birds Eye Frozen Peas, other Frozen goods and Walls Ice Cream from retailers – and play the results into their ERP system. I wrote the code that restored the databases from the database journal in the event of a system malfunction, hence minimising downtime.

VMS TP Monitor Test Suite (Birds Eye Walls, Macro-32 and VAX Cobol on VAX/VMS). Having done the database restore code, I was asked to write some test programs to do regression tests on the system as we developed the TP Monitor. Helped it all ship on time and within budget.

VMS Print Symbiont Job Logger (Birds Eye Walls, Macro-32 on VAX/VMS). One of the big scams on the previous system was the occasional double printing of a customer invoice, which doubled as a pick list for the frozen food delivery drivers. If such a thing happened inadvertently or on purpose, it was important to spot the duplicate printing and ensure the delivery driver only received one copy (otherwise they’d be likely to receive two identical pick lists, take away goods and then be tempted to lose one invoice copy; free goods). I had to modify the VMS Print Symbiont (the system print spooler) to add code to log each invoice or pick list printed – and for subsequent audit by other peoples code.

Tape Cracking Utilities (36 Various Situations, Macro-32 on VAX/VMS). After moving into Presales, the usual case was to be handed some Fortran, Cobol or other code on an 800 or 1600bpi Magnetic Tape to port over and benchmark. I ended up being the district (3 offices) expert on reading all sorts of tapes from IBM, ICL and a myriad of other manufacturers systems I built a suite of analysis tools to help work out the data structures on them, and then other Macro-32 code to read the data and put them in a format usable on VAX/VMS systems. The customer code was normally pretty easy to get running and benchmarks timed after that. The usual party trick was to then put the source code through a tool called “PME”, that took the place of the source code debugger and sampled the PC (Program Counter) 50 times per second as the program ran. Once finished, an associated program output a graph showing where the users software was spending all its time; a quick tweak in a small subroutine amongst a mountain of code, and zap – the program ran even faster. PME was productised by author Bert Beander later on, the code becoming what was then known as VAX Performance and Coverage Analyzer – PCA.

Sales Out Reporting System (Datatrieve on VAX/VMS). When drafted into look after our two industrial distributors, I wrote some code that consolidated all the weekly sales out reporting for our terminals and systems businesses (distributors down to resellers that bought through each) and mapped the sales onto the direct account team looking after each end user account that purchased the goods. They got credit for those sales as though they’d made the sales themselves, so they worked really effectively at opening the doors to the routine high volume but low order value fulfilment channels; the whole chain working together really effectively to maximise sales for the company. That allowed the End User Direct Account Teams to focus on the larger opportunities in their accounts.

Bakery Recipe Costing System (GW-Basic on MS-DOS). My father started his own bakery in Tetbury, Gloucestershire, selling up his house in Reading to buy a large 5-storey building (including shopfront) at 21, Long Street there. He then took out sizable loans to pay for an oven, associated craft bakery equipment and shop fittings. I managed to take a lot of the weight off his shoulders when he was originally seeing lots of spend before any likely income, but projecting all his cashflows in a spreadsheet. I then wrote a large GW-Basic application (the listing was longer than our combined living and dining room floors at the time) to maintain all his recipes, including ingredient costs. He then ran the business with a cash float of circa 6% annual income. If it trended higher, then he banked the excess; if it trended lower, he input the latest ingredient costs into the model, which then recalculated the markups on all his finished goods to raise his shop prices. That code, running on a DEC Rainbow PC, lasted over 20 years – after which I recoded it in Excel.

CoeliacPantry e-Commerce Site (Yolanda Cofectionery, predominantly PHP on Red Hat Linux 7.2). My wife and fathers business making bread and cakes for suffers of Coeliac Disease (allergy to the gluten found in wheat products). I built the whole shebang from scratch, learning Linux from a book, then running on a server in Rackshack (later EV1servers) datacentre in Texas, using Apache, MySQL and PHP. Bought Zend Studio to debug the code, and employed GPG to encode passwords and customer credit card details (latter maintained off the server). Over 300 sales transactions, no chargebacks until we had to close the business due to ill-health of our baker.

Volume/Value Business Line Mapping (Computacenter, VBA for Excel, MS-Windows). My Volume Sales part of the UK Software Business was accountable for all sales of software products invoiced for amount under £100,000, or where the order was for a Microsoft SELECT license; one of my peers (and his team of Business Development Managers) focussed on Microsoft Enterprise Agreements or single orders of £100,000 or more. Simple piece of Visual Basic for Applications (VBA) code that classified a software sale based on these criteria, and attributed it to the correct unit.

MongoDB Test Code (self training: Python on OS/X). I did a complete “MongoDB for Python Developers” course having never before used Python, but got to grips with it pretty quickly (it is a lovely language to learn). All my test code for the various exercises in the 6 week course were written in Python. For me, my main fascination was how MongoDB works by mapping it’s database file into the address space above it’s own code, so that the operating systems own paging mechanism does all the heavy lifting. That’s exactly how we implemented Virtual Files for the TP Monitor for Birds Eye Walls back in 1981-2. With that, i’ve come full circle.

Software Enabled (WordPress Network): My latest hack – the Ubuntu Linux Server running Apache, MySQL, PHP and the WordPress Network that you are reading words from right now. It’s based on Digital Ocean servers in Amsterdam – and part of my learning exercise to implement systems using Public Cloud servers. Part of my current exercise trying to simplify the engagement of AWS, Google Cloud Services and more in Enterprise Accounts, just like we did for DECdirect Software way back when. But that’s for another day.

 

Did you know 2.8% of your customers are dead?

Gravestone saying "Rest in Peace"

Those are the exact words I mentioned to the CEO of a B2C company – where customers were paying monthly subscription fees. Unfortunately, his immediate question back was “Okay, but how many of those are on Direct Debit?”. I think my reaction to his interest in the number of people paying for his service, but not using it, was one where I thought he’d do damage to his brand than earn him extra profits.

It’s long been an accepted view that Marketing in IT circles is often considered a set of “hail Mary” throws to attract new potential customers, with little of the precision of folks who have that title in Fast Moving Consumer Goods (FMCG) companies. I’ve sat in meetings with a roomful of IT reseller “Marketing” folks to find I was the only company present doing systematic testing to find out what works, what didn’t and to use this learning to continuously improve. More a case of “getting the letter out” and losing the ability to learn anything; surely much better to send two different wordings out, to see which one pulled better – at the very least.

I’m reminded of one person I met who worked in the field for a Chocolate vendor, and like all his industry colleagues, could relate to 1/10 of one percent shifts in his market share in the retail outlets he supplied through. He was expected to have an action plan in place if anything slipped a little, or to do more of anything that slightly increased his share. One day, fed up selling Chocolate bars, he decided to move to a company selling software to large IBM computer installations.

He walked in to see his new boss, and made the fundamental mistake of asking what his software products market share was in IBM Mainframe installations working in the Finance Industry and in the counties of Avon and Somerset. His new boss looked at him as if he’d arrived from the Planet Zog, and told him just to get on the road and sell something. He ended up thinking this was dumb, and set up his own company to fix the gap.

He elected to start sending out questionnaires to all the large IBM customer sites in the UK (there were, at the time, some 1,000-1,500 of them), getting a telesales team to help profile each site, and to reflect the use of hardware and software products in each. Then sent out a quarterly summary, segmented by industry, of what everyones peers were using – so the survey participants saw value in knowing what their peers in like organisations were doing. He subsequently extended  the scope to cover other vendors, and gradually picked up a thorough profile of some 30,000 installations, covering over 80% of Enterprise IT spend in the country.

At that point, he had an ever-evolving database of all mix of hardware and software in each, coupled with all the senior decision makers details, and even the names of IT projects both planned and underway in each. The last time I had a meeting with him, he could aim me into the best 5-10 prospects for my IT products and services that aligned with what my ideal customer would look like (in terms of associated prerequisite products they were already running) – with a single rifle shot – allowing sales focus and without spreading sales effort over many unproductive lead follow-ups. Marketing (and Sales) Gold. Expensive to use, by worth every penny.

He subsequently sold his company and the database to Ziff Davis, then to Harte Hanks – the same folks who compile the list of dead people (from published UK Death Certificates) that was part of the profiling exercise I undertook and that I mentioned above. They then sold the same Database assets and it’s regular surveys to the company where it resides today.

Apart from that data, there are a number of other useful sources you can draw on. I once managed to persuade MySQL (before Sun, long before Oracle, ownership) to get their customers profiled, just by relating postcodes we deduced from sampling contact addresses and/or the same from location information on their web sites. It turned out that 26% of their base existed in System Integrators, Web Development and Software companies, while the remaining 74% was flat as a pancake over 300 other SIC codes. Very difficult to target as a whole unless you had the full list of customers – which only they did! There are also various mailing lists, MeetUps, forums and resources like GitHub where you can get a view of where specific developer skills are active.

All very basic compared to Consumer Marketing, where armed with a name, a date of birth and/or a postcode, you can deduce a pretty compelling picture of what your B2C customer looks like, family make-up, what they read and their relative wealth. When I was at Demon Internet (first UK Internet Service Provider), we could even spot one segment of very heavy users that, back in 1997, turned out to be 16-19 year olds living in crowded accommodation, playing online games and with no parental supervision of the associated phone costs. We also had the benefit of one external consultant who was adept at summarising 550 pages of BMRB Internet Survey number tables, producing an actionable and succinct 3-5 pages of A4 trends to ride on.

Today, with even the most expensive mobile smartphones starting to commoditise – and vendors looking to emphasize even the smallest differentiation now – I wasn’t too surprised that Samsung in the USA have today landed an ex-VP of Proctor & Gamble to head their Marketing Efforts going forward.

With that, the IT industry has now come full circle – and FMCG class Marketing skills will start to become ever more important in our midst.

 

“Big Data” is really (not so big) Data-based story telling

Aircraft Cockpit

I’m me. My key skill is splicing together data from disparate sources into a compelling, graphical and actionable story that prioritises the way(s) to improve a business. When can I start? Eh, Hello, is anyone there??

One characteristic of the IT industry is its penchants for picking snappy sounding themes, usually illustrative of a future perceived need that their customers may wish to aspire to. And to keep buying stuff toward that destination. Two of these terms de rigueur at the moment are “Big Data” and “Analytics”. There are attached to many (vendor) job adverts and (vendor) materials, though many searching for the first green shoots of demand for most commercial organisations. Or at least a leap of faith that their technology will smooth the path to a future quantifiable outcome.

I’m sure there will be applications aplenty in the future. There are plenty of use cases where sensors will start dribbling out what becomes a tidal wave of raw information, be it on you personally, in your mobile handset, in lower energy bluetooth beacons, and indeed plugged into the “On Board Diagnostics Bus” in your car. And aggregated up from there. Or in the rare case that the company has enough data locked down in one place to get some useful insights already, and has the IT hardware to crack the nut.

I often see desired needs for “Hadoop”, but know of few companies who have the hardware to run it, let alone the Java software smarts to MapReduce anything effectively on a business problem with it. If you do press a vendor, you often end up with a use case for “Twitter sentiment analysis” (which, for most B2B and B2C companies, is a small single digit percentage of their customers), or of consolidating and analysing machine generated log files (which is what Splunk does, out of the box).

Historically, the real problem is data sitting in silos and an inability (for a largely non-IT literate user) to do efficient cross tabulations to eek a useful story out. Where they can, the normal result is locking in on a small number of priorities to make a fundamental difference to a business. Fortunately for me, that’s a thread that runs through a lot of the work i’ve done down the years. Usually in an environment where all hell is breaking loose, where everyone is working long hours, and high priority CEO or Customer initiated “fire drill” interruptions are legion. Excel, Text, SQLserver, MySQL or MongoDB resident data – no problem here. A few samples, mostly done using Tableau Desktop Professional:

  1. Mixing a years worth of Complex Quotes data with a Customer Sales database. Finding that one Sales Region was consuming 60% of the teams Cisco Configuration resources, while at the same time selling 10% of the associated products. Digging deeper, finding that one customer was routinely asking our experts to configure their needs, but their purchasing department buying all the products elsewhere. The Account Manager duly equipped to have a discussion and initiate corrective actions. Whichever way that went, we made more money and/or better efficiency.
  2. Joining data from Sales Transactions and from Accounts Receivable Query logs, producing daily updated graphs on Daily Sales Outstanding (DSO) debt for each sales region, by customer, by vendor product, and by invoices in priority order. The target was to reduce DSO from over 60 days to 30; each Internal Sales Manager had the data at their fingertips to prioritise their daily actions for maximum reduction – and to know when key potential icebergs were floating towards key due dates. Along the way, we also identified one customer who had instituted a policy of querying every single invoice, raising our cost to serve and extending DSO artificially. Again, Account Manager equipped to address this.
  3. I was given the Microsoft Business to manage at Metrologie, where we were transacting £1 million per month, not growing, but with 60% of the business through one retail customer, and overall margins of 1%. There are two key things you do in a price war (as learnt when i’d done John Winkler Pricing Strategy Training back in 1992), which need a quick run around customer and per product analyses. Having instituted staff licensing training, we made the appropriate adjustments to our go-to-market based on the Winkler work. Within four months, we were trading at £5 million/month and at the same time, doubled gross margins, without any growth from that largest customer.
  4. In several instances that demonstrated 7/8-figure Software revenue and profit growth, using a model to identify what the key challenges (or reasons for exceptional performance) were in the business. Every product and subscription business has four key components that, mapped over time, expose what is working and what is an area where corrections are needed. You then have the tools to ask the right questions, assign the right priorities and to ensure that the business delivers its objectives. This has worked from my time in DECdirect (0-$100m in 18 months), in Computacenter’s Software Business Units growth from £80-£250m in 3 years, and when asked to manage a team of 4, working with products from 1,072 different vendors (and delivering our profit goals consistently every quarter). In the latter case, our market share in our largest vendor of the 1,072 went from 7% UK share to 21% in 2 years, winning their Worldwide Solution Provider of the Year Award.
  5. Correlating Subscription Data at Demon against the list of people we’d sent Internet trial CDs to, per advertisement. Having found that the inbound phone people were randomly picking the first “this is where I saw the advert” choice on their logging system, we started using different 0800 numbers for each advert placement, and took the readings off the switch instead. Given that, we could track customer acquisition cost per publication, and spot trends; one was that ads in “The Sun” gave nominal low acquisition costs per customer up front, but were very high churn within 3 months. By regularly looking at this data – and feeding results to our external media buyers weekly to help their price negotiations – we managed to keep per retained customer landing costs at £30 each, versus £180 for our main competitor at the time.

I have many other examples. Mostly simple, and not in the same league as Hans Rosling or Edward Tufte examples i’ve seen. That said, the analysis and graphing was largely done out of hours during days filled with more customer focussed and internal management actions – to ensure our customer experience was as simple/consistent as possible, that the personal aspirations of the team members are fulfilled, and that we deliver all our revenue and profit objectives. I’m good at that stuff, too (ask any previous employer or employee).

With that, i’m off writing some Python code to extract some data ready ahead of my Google “Making Sense of Data” course next week. That to extend my 5 years of Tableau Desktop experience with use of some excellent looking Google hosted tools. And to agonise how to get to someone who’ll employ me to help them, without HR dissing my chances of interview airtime for my lack of practical Hadoop or MapR experience.

The related Business and People Management Smarts don’t appear to get onto most “Requirements” sheet. Yet. A savvy Manager is all I need air time with…

Louboutin Shoes, Whitened Teeth and Spamalot

Picture of a Stack of Tins of Spam Meat

I run a WordPress Network on one of my Linux Servers in Digital Ocean, Amsterdam – the very machine serving you with this text. This has all the normal network protections in place, dropping virtually everything that makes its way in through what can be classified as a common attack vector. Unless the request to fire up root access comes from my fixed IP address at home, it doesn’t get as far as even asking for a password. Mindful of this, I check the logs occasionally, mostly to count how many thousand break-in attempts my security handiwork resisted, and to ensure no-one inappropriate has made it through. That apart, everything just hums away in the background.

A few days back, I installed the iOS WordPress app on my iPad Mini, and likewise the Android version on my Nexus 5 phone. Armed with some access credentials, these both peek at the system and allow me to update content remotely. Even to authorise comments where i’ve chosen to allow them in, and to approve them for display where i’ve indicated I want that control. Even though I have only one WordPress site that even accepts inbound comments, I started getting notifications that comments were arriving and awaiting moderation:

Screenshot of WordPress App, showing Spam arriving and attached to Gallery Images

Strange thing is that “Oktoberfest” and “Loddon Medal” were images in sites where I nominally had all comments switched off. However, WordPress appears to have a default where people can comment on images stored as attachments on the site, and also allows folks to insert trackback URLs – pointing to other (nominally more authoritative) sources of same content. Both features now seem to have fallen into wide disrepute and used by bots to load up comment spam on unsuspecting WordPress sites.

Job number one was to shut the barn door on these – for which there is a nice “WP Comment Control” plugin that can deny all future capability to exploit these features, site by site, in your WordPress network. Duly installed and done. The next job was to find where all the comments had been left, and remove them; on inspection, they were all on a dummy template site i’d left as an example of work that I could easily replicate and tailor for a new paying customer. Over 10,500 comments and trackbacks awaiting moderation, mostly relating to folks promoting teeth whitening services, or selling red soled Louboutin shoes. I’d never noticed these before – a nice side benefit of having my iPad and my Nexus phone plumbed in and telling me I had new content awaiting for approval somewhere deep in my site hierarchy

You can do things manually, 20 at a time, marking comments as spam, trashing them and then emptying the trash. None of the automated removal plugins appeared to work on a WordPress Network site (only clearing things from the first site on the system), so a more drastic solution needed to retain my sanity and my time. I ended up working out how the individual sites on the network mapped into MySQL database tables (the /ld3 site on my host mapped into table wp-5-comments in database wordpress). Then some removal with a few lines of MySQL commands, primarily ‘delete from wp-5-comments where comment_approved = ‘spam’ or comment_approved = ‘0’ or comment_approved = ‘1’;

With that, all unwanted 10,500+ spam records gone in 0.39 of a second. All locked down again now, and we live until the next time the spammers arms race advances again.