Showing posts with label Software Testing. Show all posts
Showing posts with label Software Testing. Show all posts

Friday, July 27, 2018

Testing Encryption - 3 years of Dan Boneh's Online Cryptography Course

Three years ago in July, I completed Dan Boneh's online cryptography course with distinction through Coursera's Cryptography 1.  Since then, I've had the opportunity to use and test cryptographic systems at work and for hobbies.  Here are a few lessons learned when testing encryption.

I have found my fair share of bugs in the crypto we chose to use at work.  I've gotten into a routine when testing encryption used for message authentication:
  • Test the same plaintext multiple times.  Does it need to be different each time?  How much of the MAC is different each time?  It might help to explore the data your hashing function spits out as it can tell you how your hash function does what it does.
  • Replay it.  How can a user abuse identical MAC'd data if they replay it at a later date?  For a different user?  Can you add items to the plaintext that will allow you to validate not only the data but the source or timeframe as well?
  • Ensure your hashes are detecting changes. Is your MAC rejected if you change the data at various places within the message?
  • Rotate the key. Do you need a hash to survive a key change?  Usually you can just regenerate the data and re-MAC it, so figure out if you really need to use MACs over long lifetimes.  They're easy to compute.
  • Generate a bunch at once.  Is performance an issue with the service?  Most hashes are built for speed, but is yours?
For each of these failure modes, I'm looking mostly for hints of weakness.  I'm expecting pseudo-random noise, but how does my brain distinguish that from almost random noise?

There are many times when you need to generate a unique but random value but don't have the space to use a GUID.  To evaluate if a solution will be "unique enough", check out the Birthday problem wikipedia page, and this table of probabilities in particular.  Find out how many possible values exist (9 numeric digits = 10^9 ~= 2^30).  Compare on the table with that value as the hash space size versus the number of times you'll be setting this value.  This will tell you if the algorithm you want to use is sufficient.  If you are making long-term IDs that can only be created once, you obviously  want the probability of collision to be extremely low.  If you can recover from a collision by creating a new transaction fairly readily, you might not need as much assurance.  Ive used this to help drive a decision to increase unique token size from 13 to 40 characters, guide switching from SQL auto-numbers to random digits to hide transaction volumes, and ensure internal transaction IDs are unique enough to guide troubleshooting and reporting.

Time and again, the past three years have taught me that cryptography must be easy for it to be used widely.  I've stayed with Signal for text messaging because it just works.  I can invite friends and not be embarrassed at its user interface.  It doesn't tick all the boxes (anonymity is an issue being a centralized solution), but it has enough features to be useful and few shortcomings.  This is the key to widespread adoption of encryption for securing communications.  Since Snowden revealed the extent of the NSA's data collection capability, sites everywhere have switched on HTTPS through Let's Encrypt. Learning more about each implementation of SSH and TLS in the course was both informative and daunting. I was anxious to get HTTPS enabled without rehosting the site on my own.  Early 2018, Blogger added the ability to do just that through Let's Encrypt.  It requires zero configuration once I toggle it on.  I can't sing its praises enough.  The content of this blog isn't exactly revolutionary, but this little move toward a private and authentic web helps us all.

Dan Boneh's Cryptography course continues to inform my testing.  The core lesson still applies: "Never roll your own cryptography."  And the second is how fragile these constructs are.  Randomness is only random enough given the time constraints.  Secure is only secure enough for this defined application.  Every proof in the course is only as good as our understanding of the math, and every implementation is vulnerable at the hardware, software, and user layers.  In spite of this, it continues to work because we test it and prove it hasn't broken yet.  I'm looking forward to another three years of picking it apart.

Monday, July 16, 2018

AES CBC Encryption on OpenVMS

It turns out that using the convenience function ENCRYPT$ENCRYPT_ONE_RECORD on OpenVMS for AES CBC does something squirrelly.  It will not let you pass in an IV.  Instead, ENCRYPT$INIT sets the IV to 16 bytes of \0.  In order to ensure your identical plaintext first blocks aren't identical after encryption, your first 16 byte block needs to be your true IV.  This will get XORd with the null byte encrypted IV and sufficiently scrambled so your second block can be uniquely encrypted.  OR just do it right and avoid the convenience method.  Use ENCRYPT$INIT properly.


"For AES, the optional P1 argument for the AES IV initialization vector is a reference to a 16-byte (2 quadword) value.
If you omit this argument, the initialization vector used is the residue of the previous use of the specified context block. ENCRYPT$INIT initializes the context block with an initialization vector of zero."

Tuesday, June 12, 2018

Quotes from Dan Kaminsky's Keynote at DEF CON China

Above is Dan Kaminsky's keynote at the inaugural DEF CON China.  It was nominally about Spectre and Meltdown, and I thought it was immediately applicable to testing at all levels.  Here are some moments that jumped out at me:

On Context:

"There's a problem where we talk about hacking in terms of only software...What does hacking look like when it has nothing to do with software." 1:55

"But let's keep digging." Throughout, but especially 5:40

"Actual physics encourages 60 frames per second. I did not expect to find anything close to this when I started digging into the number 60...This might be correct, this might not be. And that is a part of hacking too." 6:10

"Stay intellectually honest as go through these deep dives. Understand really you are operating from ignorance. That's actually your strong point. You don't know why the thing is doing what it is doing...Have some humility as you explore, but also explore." 7:40

"We really really do not like having microprocessor flaws...and so we make sure where the right bits come in, the right bits come out. Time has not been part of the equation...Security [re: Specter/Meltdown] has been made to depend on an undefined element. Context matters." 15:00

"Are two computers doing the same thing?...There is not a right answer to that. There is no one context. A huge amount of what we do in we play contexts of one another." 17:50

[Re: Spectre and Meltdown] "These attackers changed time which in this context is not defined to exist...Fast and slow...means nothing to the chip but it means everything to the users, to the administrators, to the security models..." 21:00

"Look for things people think don't matter. Look for the flawed assumptions...between how people think the system works and how it actually does." 35:00

"People think bug finding is purely a technical task. It is not because you are playing with people's assumptions...Understand the source and you'll find the destination." 37:05

"Our hardest problems in Security require alignment between how we build systems, and how we verify them. And our best solutions in technology require understanding the past, how we got here." 59:50

On Faulty Assumptions:

"[Example of clocks running slow because power was not 60Hz] You could get cheap, and just use whatever is coming out of the wall, and assume it will never change. Just because you can doesn't mean you should...We'll just get it from the upstream." 4:15

"[Re: Spectre and Meltdown] We turned a stability boundary into a security boundary and hoped it would work. Spoiler alert: it did not work." 18:40

"We hope the design of our interesting architectures mean when we switch from one context to another, nothing is left over...[but] if you want two security domains, get two computers. You can do that. Computers are small now. [Extensive geeking out about tiny computers]" 23:10

"[RIM] made a really compelling argument that the iPhone was totally impossible, and their argument was incredibly compelling until the moment that Steve Jobs dropped an iPhone on the table..." 25:50

"If you don't care if your work affects the [other people working on the system], you're going to crash." 37:30

"What happens when you define your constraints incorrectly?... Vulnerabilities. ...At best, you get the wrong answer. Most commonly, you get undefined behavior which in the presence of hacking becomes redefinable behavior." 41:35

"It's important to realize that we are loosening the assumption that the developer knows what the system is supposed to do...Everyone who touches the computer is a little bit ignorant." 45:20

On Heuristics

"When you say the same thing, but you say it in a different time, sometimes you're not saying the same thing." 9:10

"Hackers are actually pretty well-behaved. When hackers crash does really controlled things...changing smaller things from the computer's perspective that are bigger things from a human's perspective." 20:25

"Bugs aren't random because their sources aren't random." 35:25

"Hackers aren't modeling code...hackers are modeling the developers and thinking, 'What did [they] screw up?' [I would ask a team to] tell me how you think your system works...I would listen to what they didn't talk about. That was always where my first bugs came from." 35:45

On Bug Advocacy

"In twenty years...I have never seen stupid moralization fix anything...We're engineers. Sometimes things are going to fail." 10:30

"We have patched everything in case there's a security boundary. That doesn't actually mean there's a security boundary." 28:10

"Build your boundaries to what the actual security model is...Security that doesn't care about the rest of IT, is security that grows increasingly irrelevant." 33:20

"We're not, as hackers, able to break things. We're able to redefine them so they can't be broken in the first place." 59:25

On Automation

"The theorem provers didn't fail when they showed no leakage of information between contexts because the right bits went to the right places They just weren't being asked to prove these particular elements." 18:25

"All of our tools are incomplete. All of our tools are blind" 46:20

"Having kind of a fakey root environment seems weird, but it's kind of what we're doing with VMs, it's what we're doing with containers." 53:20

On Testing in the SDLC

"We do have cultural elements that block the integration of forward and reverse [engineering], and the primary thing we seem to do wrong is that we have aggressively separated development and testing, and it's biting us." 38:20

"[Re Penetration Testing]: Testing is the important part of that phrase. We are a specific branch of testers that gets on cooler stages...Testing shouldn't be split off, but it kinda has been." 38:50

Ctd. "Testing shouldn't be split off, but it kinda has to have been because people, when they write code, tend to see that code for what it's supposed to be. And as a tester, you're trying to see it for what it really is. These are two different things." 39:05

"[D]evelopers, who already have a problem psychologically of only seeing what their code is supposed do, are also isolated from all the software that would tell them [otherwise]. Anything that's too testy goes to the test people." 39:30

"[Re: PyAnnotate by @Dropbox] 'This is the thing you don't do. Only the developer is allowed to touch the code.' That is an unnecessary constraint." 43:25

"If I'm using an open source platform, why can't I see the source every time something crashes? me the source code that's crashing...It's lovely." 47:20

"We should not be separating Development and Testing... Computers are capable of magic, and we're just trying to make them our magic..." 59:35


"Branch Prediction: because we didn't have the words Machine Learning yet. Prediction and learning, of course they're linked. Kind of obvious in retrospect." 27:55

"Usually when you give people who are just learning computing root access, the first thing they do is totally destroy their computer." 53:40 #DontHaveKids

"You can have a talent bar for users (N.B.: sliding scale of computer capability) or you can make it really easy to fix stuff." 55:10 #HelpDesk
"[Re: Ransomware] Why is it possible to have all our data deleted all at once? Who is this a feature for?!... We have too many people able to break stuff." 58:25

Sunday, June 10, 2018

Postman Masterclass Pt. 2

During my second Postman meetup as part of the Las Vegas Test Automation group, we were able to cover some of the more advanced features of Postman. It's a valuable tool for testing RESTful services (stronger opinions on that also exist), and they are piling on features so fast that it is hard to keep track. If you're a business trying to add automation, Postman is easily the lowest barrier to entry to doing so. And with a few tweaks (or another year of updates) it could probably solve most of your API testing.

The meetup covered the Documentation, Mock Server and Monitor functionality. These are pieces that can fit in your dev organization to smoothe adoption, unroadblock, and add automation with very little overhead. Particularly, the Mock servers they offer can break the dependency on third party integrations quite handily. This keeps Agile sprints moving in the face of outside roadblocks. The Monitors seem like a half-measure. They gave a GUI for setting up external monitors of your APIs, but you still need Jenkins and their Newman node package to do it within your dev env. The big caveat with each of these is that they are most powerful when bought in conjunction with the Postman Enterprise license.  Still, at $20 a head, it's far and away the least expensive offering on the market.

Since the meetup, I've found a few workarounds for the features I wish it had that aren't immediately accessible from the GUI. As we know in testing in general, there is no one-size fits all solution.  And the new features are nice, but they don't offer some of the basics I rely on to make my job easier.  Here is my ever-expanding list of add-ons and hidden things you might not know about.  Feel free to comment or message me with more:

Postman has data generation in requests through Dynamic Variables, but they're severely limited in functionality. Luckily, someone dockerized npm faker into a restful service. This is super easy to slip stream into your Postman Collections to create rich and real-enough test data. Just stand it up, query, save the results to global variables, and reuse them in your tests.

The integrated JavaScript libraries in the Postman Sandbox are worth a fresh look. The bulk of my work uses lodash, crypto libraries, and tools for validating and parsing JSON. This turns your simple requests to data validation and schema tracking wonders. 

  • Have a Swagger definition you don't trust? Throw it in the tv4 schema validator. 
  • Have a deep tree of objects you need to be able to navigate RESTfully? Slice and dice with lodash, pick objects at random, and throw it up into a monitor. Running it every ten minutes should get you down onto the nooks and crannies.
This article on bringing the big list of naughty strings ( is another fantastic way to fold in interesting data to otherwise static tests. The key is to ensure you investigate failures. To get the most value, you need good logs, and you need to pay attention to your results in your Monitors.

If you have even moderate coding skills among your testers, they can work magic on a Postman budget. If you were used to adding your own libraries in the Chrome App, beware: the move to a packaged app means you no longer have the flexibility to add that needed library on your own (faker, please?).

More to come as I hear of them.

Tuesday, March 20, 2018

Behat AfterScenario, PHP Garbage Collection, and Singletons

In Behat, I added a singleton to our contexts to store things across scenarios, but I ran into trouble when trying to keep separation between my tests.  The storage object allowed me to be creative with builders, validators, and similar ways of reducing repetition and making the PHP code behind easier to read.  There was a problem though: it would randomly be cleared in the middle of a test.

The only thing I knew was the object would get cleared at relatively the same time.  I had a set of about 50 different tests in a single feature.  This would call an API multiple times, run validations on the responses, and then move on to the next test.  All the while, it would put information into the storage object.  The test would not just fail in the middle of a scenario, it would generally fail near the same part of a scenario every time.  it was timing, an async process, or something was clearing a logjam.

While designing the storage object, I had the bright idea to clear it with every scenario.  The singleton acts like a global variable, and a clear after each one would ensure data from one test didn't pop up in another.  To make sure i was running this at the last possible moment, I put the clear into the __destruct() method of my context class.  By putting the clear in the destructor, I gave PHP permission to handle it as it saw fit.  In reality, it sometimes left my scenario objects to linger while running the next (due to a memory leak or similar in Behat itself, or a problem in my code; I couldn't tell).

 * Destructor
public function __destruct()

I first stopped clearing the store and the bugs went away.  Whew!  But how could I make sure I wasn't contaminating my tests with other data and sloppy design?  I tried two things:

1) gc_collect_cycles() forces the garbage collector to run.  This seems to have the same effect of stopping the crashes, but it was kind of a cryptic thing to do.  I had to put it in the constructor of the Context rather than something that made more sense.

 * FeatureContext constructor.
public function __construct()
 * Bootstrap The Store
 ApiContextStore::create(); // Creates an instance if needed
2) Putting in an @AfterScenario test provided the same protection, but it ran, purposefully, after every test was complete.  I'm not freeing memory with my clear, so relying on garbage collection wasn't a priority.  I just needed it to run last.

 * @AfterScenario
 * Runs after every scenario */
public function cleanUpStore()

Monday, March 5, 2018

Postman Master Class Pt. 1

I gave a talk on using Postman while testing. We covered the UI, creating a collection, working with environment variables, and chaining tests with JavaScript.

A big surprise from this talk was how few testers knew about Postman to begin with. When I first started testing websites, I wanted a more reliable way of submitting http requests. The ability to save requests got me out of notepad and command-line cURL. The move to microservices only made it more useful to me.

By far, the biggest discovery was how many testers there were that had never explored its signature features. Environments and scripting make the instrumentation of integration testing almost effortless. Organizations that want automation but don't want to give the time can turn simple tests into bare bones system tests for very little further expense.

I'm planning a Part 2 where I can talk about Newman, the command line collection runner. I also want to demonstrate the mocking and documentation features. If a company adopts their ecosystem, it has the potential to make a tester's life much easier.  Even if it's only a tester's tool, it can help them communicate better with developers and reach into the product with greater ease.


Sunday, September 17, 2017

Review: Women in Tech

I finished reading @tarah's book Women in Tech. What better way to celebrate its paperback release than with a quick review.

Five years ago, I found my life turned inside out. People asked me deeply personal questions and questioned my basic competence. In the center of the maelstrom, I found comfort in a book with stories of people like me who were successful in spite of the difficulty. The stories were also paired with advice on how others has survived, thrived, and moved past the traumatic events.

In my case, my spouse had come out and I was coming to grips with my future as a straight half of a mixed orientation marriage. The book that helped me through that was The Other Side of the Closet: The Coming-Out Crisis for Straight Spouses and Families. Just knowing that I was not alone had a powerful influence. Therapy had helped; family could be supportive; friends might be weird. Those stories gave me the strength to say, "This too shall pass." 

For me, Women in Tech knocks it out of the park in a similar fashion. Concise, varied, authoritative women have lined up to share their experience making it in tech. Some faced abuse while others encountered discrimination. In the end, most felt the creeping fear of being an imposter (poignant in light of the abuse hurled at Equifax's Music Major CISO). For marginalized groups, simply knowing you're not alone can be enough strength for the day-to-day challenges.

The practical advice made it particularly useful for me. Coming to tech by way of tech support, I had no tutelage in interviewing, technical CV's, and salary negotiation. To this end, I've rewritten my resumé, registered a domain for this humble blog, and continue to try to organize a testing meetup in this desert town of mine. I don't know if each step individually will bear fruit, but together they make me feel less vulnerable to a manager's whim. I have a presence online and a skill to sell independent of any one job.

Broadly, Women in Tech has helped me understand the journey many of my co-workers have made. A fantastic tester with 20 years of experience that is comfortable in OpenVMS still expresses a lack of confidence in interpreting 'man words'. A skilled project manager guided countless projects from C-suite dream to customer reality while being a betimes single mom.  Being so broadly defined, tech needs diverse voices at all levels, and it particularly needs women and their contributions supported wherever possible.

There are plenty of gems in the book that I can't begin to address. My heart broke when the advice had to find a balance between optimism and reality. Having my spouse, an engineer by trade, transition made me want to learn more about the trans-in-tech experience. The constant refrain of Impostor's Syndrome makes me want to look for research papers. It is clear that Tarah has captured experiences with a depth and variety unavailable elsewhere.

I wouldn't be a hacker if I didn't mention the brain testing crypto puzzles at the heading of each chapter. Themed on famous women in tech, the learning curve is steep. I am currently stuck and, as the book makes perfectly clear, progress can only be made with help from all sides.

Monday, October 26, 2015

Testing uTest: or How I Learned to Stop Worrying and Love the Gig Economy

I’m a member of an online community of testers called uTest (the practitioner-facing side of Applause).  The company hosts a social network for testers as well as offers short-term gigs to its users (the cliched Uber for testers).    I was called on this weekend for my first gig: testing a payment method in Taxis.  It was pleasant, if stressful, and it made me think of ways a company could take advantage of it to expand the perspective on their products. 

After the initial invite to the test cycle, I communicated with a project manager via Skype to ensure I was able to carry out the test scenarios.  I brought to the table a specific model of phone and a verified ability to pull debug logs from it (thanks Verizon for turning off the op codes on my S4; I resorted to adb logcat).  They provided technical assistance and reimbursement for the transactions, but the primary incentive was a reward for test cases completed.

Throughout, I felt like a skilled secret shopper rather than a functional tester.  I was asked more about the non-software components of the launch than the app or phone functionality.  I reported on the advertisement of the feature, the education of the employees, and the capability of the equipment that supports the feature. In spite of expectations from the participating companies that the advertisement would match hardware would match training, this happened 0% of the time, and no employee had been informed why their equipment had been updated.  I wasn’t the only tester in this project on the ground, and the others testers saw related issues, and none had all their ducks in a row.  In all, the information they were most excited about was the boots-on-the-ground report I provided.  It was fascinating to see a live product launch from that perspective, and doubly so considering my history in this product space. 

The final bulk of time spent on this gig was an experience report.  Complete with detailed feedback, photos of the equipment, and other evidence, this is where I was able to comment on the process as a whole.  From a testing perspective, I was able to provide detailed UI/UX feedback, evaluate the consistency of the process, and help them understand how much of their plans actually came off during the launch period.  There was some repetition in these reports.  One was a document with pictures, the other was a survey that linked to the doc and a third was a JIRA-in-spreadsheets for tracking testers.  These reports were all submitted to the project manager for feedback, and I received an up/down "accepted" in less than 24 hours.  While there is definitely room for improvement on the platform that would reduce tester overhead, it wasn't enough of a burden to avoid gigs.

Participation in this project taught me that coordination is hard, education of low-skill workers is even harder, and successful coordinated nation-wide launches like this are next to impossible.  This mirrored my experience with other companies.  There are always bits and pieces of projects that do not make their way down the chain to the workers that are face-to-face with customers, so having a boots-on-the-ground perspective is vital.

Overall, a company can leverage uTest services to add layers of testing to their launch activities over and above internal testing.  Specific and targeted verification is where they are the strongest.  They can provide feedback on as to whether an uneducated but motivated customer can make it through the new process right now.  Specific errors and third-party verification can be  farmed out for feedback through such a relationship: things the company wants to do but can’t do efficiently given a small staff or tight timelines.  This can provide a company with an enhanced view of the product and help challenge their assumptions before their customers do.

Note: edited to be a little more coy about who I was testing for and what I was testing.

Tuesday, August 11, 2015

Responding to Vulnerabilities and Weaknesses in your Product: A Tester's Perspective

Compare and Contrast: Tesla's response to researchers: 

With Oracle's: 

To be fair, no company should have to sift through an automated report from a static analysis tool.  It’s not worth their time.  In fact, the tone of the Oracle Blog that isn’t completely unproductive is, “Do the research for yourself!  Give me exploits or give me death!”  As a Tester, this is the core of bug advocacy, and I want to destroy the trust lazy researchers put in automated scanners, lazy managers put into automated checking, and the lack of human interaction endemic in development in general.

That being said, chiding someone for spending their own coin to find a exploit with, “But you really shouldn’t have broken the EULA.  Nanny Nanny Boo Boo,” is unproductive at best and an invitation to become the target of malicious actors at worst.  No one cares about your EULA.  Not even the government gives it the time of day.  Your tantrum just makes that many more people want to do things to piss you off.

Monday, February 16, 2015

The Uncanny Valley of User Interfaces

Of all the things to get peeved about, you wouldn't think commercials would be one of them.  In spite of avoiding commercials by cutting the cable cord, I have found myself employed at a place that runs ESPN 24/7.  While it's not as bad as commercials on Fox News, certain commercials have begun to irk me for reasons that only a software guy can understand.  In general, I believe this revulsion can teach us how to test user experiences.

To start, allow me present two exhibits: Jillian Michaels for Nordic-track and Trivago Guy for Trivago.  Have you watched them?  Good.  Are ads for them already populating your sidebar?  Allow me to apologize.  So, did anything bother you?

Now, here is a second exhibit: Fiddler on the Roof.  I'm sorry if I ruin this for you, but the fiddler is not actually playing his instrument.  In high school, I saw this movie once in class, and again after I spent a year in orchestra.  The second time through, I knew as little about playing the violin as one can, and it still bothered me that the fiddler wasn't successfully playing in sync with the music.  The distraction caused by bad miming is well documented.  Recent Amazon series Mozart in the Jungle has it.  Other movies aren't immune either.  Even a talented musician can be affected by it without equal attention to production (try not to notice the out-of-sync miming and white truck in the background 52 seconds in).  The core problem is that you need to train someone who is good at acting to fake a highly skilled activity convincingly enough to keep the audience's focus.  If the audience knows even a little, their belief is unsuspended, and you can lose all credibility.

Normally, this is fine.  in a two hour film or season of a television show, the viewer is pulled in by the drama.  The story isn't about how well the artist can play the violin; it is about the character's story. Also, most directors can rely on the majority of their audiences not knowing what skilled activities like playing the violin and hacking actually look like.  Camera tricks can make some instruments easier to fake than others.  Furthermore, some miming has become acceptable and celebrated through parody.  We know it's hard to convey on film.  Should that mean we don't tell this story?

Unfortunately, there is a class of film that cannot rely on the social contract to ignore non-experts acting out difficult skills.  Its short format leaves no time to pull a person back in after having been jarred from their witless stupor.  What pitiless medium is this?  Commercials.  In 30 seconds or less, the disingenuous spokesman is called out.  The thankless medium that pays for most of our entertainment is reviled for cheesy effects, leaps of credulity, and now one more heresy: poorly mimed user interfaces.

Let's return to our exhibits, shall we?  Nordic-track wants to sell us another piece of exercise equipment that we will only use as long as we are paying for it.  To do this, they've strapped on a touchscreen to navigate workouts.  What does Jillian Michaels use to navigate?  Two fingers, and she watches the screen as she swipes left.  Thinking back to your own interactions with iPads, touchscreen kiosks and mobile phones, when have you ever watched the old screen disappear?  And when have two fingers ever been used to navigate a touchscreen?

The second is Trivago Guy.  Not only has he drawn attention for being creepy, but his ham-handed interaction with the user interface makes me cringe.  Poking at points on a calendar, full-handed presses of the Search button, and similar miming make me look up from my desk and gag.  Similar interactions with after-effects like those in the Endurance Extended Warranty commercial make me wonder if anyone thought to proof the commercial before buying ad spots day after day, year after year.  An alternative explanation would be that the producers honestly think this is the way people interact with computers.  Either one disarms the viewer and places the product as unfavorable in their eyes.

I would like to propose that each of the above cases can be grouped together as potential examples of the Uncanny Valley.  As a movie viewer familiar with how a violin is played, I connect notes and the movement of bow in a way that the uninitiated cannot.  I reject the characterization as invalid for a brief period, but my emotions pull me back in with other human interactions elicited by the actor's performance.  For these commercials, this does not happen.  The terrible user interface interactions remove focus from the message of the commercial, and it is judged as unfit just long enough that I reject the product on offer.  Worse, subsequent viewings reinforce my first impressions.

Generalized lessons in the User Experience design space are many.  After testing a new user interface, I have found it helpful to let the uninitiated take it for a spin.  While I have been long-desensitized to bad interaction by other considerations ("It actually works once you get used to it!" being one, "I just want to be done with this." being another), initiates see the unnatural interaction, and one can read the revulsion in their face even when they don't come right out and say it.  This commercial for the new product is a failure, and I don't wait around to see them lulled into the same sense of security.  It stinks, they know it, and now I do to.  While users might not be expected to use a new interface right away, something that is counter-intuitive from the start should be avoided.  Depending on the medium, counter-intuitive steps can be overcome, but not without good decisions elsewhere to draw the user back in.

Tuesday, January 6, 2015

Context-Driven Testing, An Education

Coming into my fifth year of Software Testing, I began to rethink it as a discipline. The current debate is between traditional methods of testing and more modern schools of thought:
  • Entrenched methods are represented by the International Software Testing Qualifications Board (ISTQB) and its many local equivalents. Certification is their path to expertise, with accompanying wares of training sessions, books, tests and standards.
  • Context-driven Testing focuses more on a set of tools and skills that typify testing.  Advocate offer classes at conferences, but certs and best-practice are four letter words.  They state that there are no best practices, and a tester knows best how to apply the tools to explore, experiment and learn about the system under test.
The conflict seemed from the sidelines like a pitched battle over the future of testing. The ISTQB and affiliated consultants had history and 300,000 certifications on its side.  The context-driven school was relatively young, but it had a few charismatic evangelists and professional results that could not be ignored.  It was plain that I needed to sort this out.

Something Just Didn't Feel Right

I strode onto this battleground in 2011 as a new manager and new tester.  Promoted from an application integration team, I was used to working with outside developers while using and abusing buggy product.  This did little to prepare me for the reality of testing: limited time and endless defects!  I dove into the online community in the hopes that it would help sort the good from the bad.  What I found to be the central influences were conferences, consultants and blogs.

At testing conferences, half the talks were advertisements for bug trackers, test case repositories and automation frameworks.  These were great for managing a project, but they didn't support the essence of testing: finding more defects.  Expensive tutorials before the conferences showed similar taint: how to use this tool, best practices for testing this specific type of thing, certification in a process and not a portable technique.  The cost was the most surprising thing: Thousands of dollars for something I couldn't justify to myself, the eager tester with a training budget to burn.

Delving into webinars brought more despair.  A demo of the latest automation tool invariably lead to a pitch to get me to purchase something.  The topic purported to discuss principles, but I ended up in the morass of industry jargon.  I learned how to write, automate, and justify my time, but I was no closer to actually finding bugs in a more efficient manner.  And what's more, I was spammed by vendors for months after.  I found zero techniques that were universal.

Finally, the blogs showed a glimmer of hope.  Some people wrote about techniques.  Others wrote about test cases and how they managed their overhead.  Still others advocated that QA should be involved earlier in the development process.  Nowhere did anyone extol the virtues of their test management system, bug tracker or automation too in helping them find bugs.  This was a breath of fresh air, but it still felt stunted and directionless.  My closest analogue, software development, spawned a generation of people applying agile to everything from families to manufacturing, but there wasn't a similarly powerful framework for testers.  I started to feel that no one was excited about my new career outside of the context of getting paid for it.

This muddled world left me questioning: Shouldn't there be some "ethos of testing" to unify our efforts just like in agile software development, lean manufacturing, and so forth?  Why do vendors have a place at the table at industry conferences?  Why isn't anyone embarrassed that the main competitor to major test management software is a spreadsheet?  Who cares about my expertise and not just my dollars?

"Answers" and Answers

For a long time, I thought the answer was in certification.  Surely, if ever there was an organization that could be a cheerleader for quality, it would be the ISTQB.  However, the reality is much different.  Manuals are filled with process, conferences host vendors and not practice sessions, and training classes are about extracting fees and not learning techniques.  The certification exam that guarantees your resume goes to the top of the pile and organization that proctors it is a laughing stock.

The alternative came through an unlikely route: Twitter.  Long a tool of celebrity publicists and companies looking to engage directly with individuals, Twitter also has a reputation as the way to communicate within a subcultures.  Have an idea?  Publish it in under 140 characters.  Want to learn the pulse of an industry?  Follow its leaders.  Computer security wonks, hackers, and now testers joined my follow list, and I was soon introduced to a new debate: is certification a waste of time?  I'd found my people.

The new school touted something called Context-driven Testing.  Instead of best practice, effective testing was supposed to be driven by context.  Instead of test cases, a set of tools were taught that could be used depending on the product (mainframes are tested differently than mobile devices).  Even among superficially similar products, the most effective testers would make judgement calls based on the needs of the customer and the time available.  Testing was not a set of rigid processes, but a scientific exploration of the software.  The knowledge gained by testing increases the confidence of the organization in the software.  In other words, testing challenges the assumptions made by the developers that the product works in the time we have available.  This sounded like the meat I was looking for, but the results were amazing too.

In an experiment, I had our work study group put down the ISTQB manual with its test cases, and I instead introduced them to exploratory techniques.  We first learned about the requirements and returned a bunch of questions to the developer.  Then we tested without scripts and tracked our coverage on a mind map.  It was the first time we had been prompted to field trial a technique in our study session.  The best part was that a person with very little previous experience in testing was able to pick up the technique almost organically.

This revelation about software testing was what we were all looking for, and it was delivered through experience instead of pronouncement.  James Marcus Bach, one of the proponents of Context-driven Testing compared the ISTQB and certification organizations to the medieval medicine of Galen.  People wrote down what testing was and proceeded to bleed their employers without knowing why it wasn't finding bugs.  Testers were outsourced instead of valued as their techniques were old or ineffective.  Yet in spite of all this, the consultants and conferences kept making money printing outdated works of questionable value.  Once context-driven techniques come to light, the old ways start dying.  We can only hope this continues so that meaningless certs are no longer valued by testers, managers and HR alike.

Where Next?

After stumbling through the world of testing for a few years, I have abandoned certification as a path to expertise.  As with computer security, network administration and technical support, certifications are a poor way of communicating true expertise.  This revelation places testers firmly in the camp of indispensable elements of the development organization.  They are not monkeys running scripts but knowledge workers with a valuable investigative skill that challenge the product from all angles.  They cannot be outsourced if you hope to be successful, and they cannot be replaced by automation.

I am beginning a new training regimen with my testing colleagues based around Context-driven techniques.  We hope to learn the techniques and apply them to our current projects and continually grow our skills in this new framework.

Further Reading

  1. A Context-driven Testing manifesto of sorts
  2. Black Box Software Testing, Coursework in Context-driven Testing
  3. Rapid Software Testing, James Marcus Bach's courses on Context-driven approaches:
  4. Exploratory Testing techniques explained
  5. Testing is not Automation.  Why Automation is another tool and not a cure all.