Thursday, February 16, 2017

Sprint Reviews and Stakeholders. Something worth discussing by Paul Heidema


I came across this post today from Paul J. Heidema.

It opens up an important discussion related to Stakeholders at Sprint Reviews.


Stakeholder have feelings and want to contribute as much as anyone else. We're all in it together.


Paul brings up some concepts related to getting stakeholders up-to-speed so they can best contribute at Sprint Reviews.


Give it read... 


It may get you thinking...  



What have we done to help our stakeholders feel like valuable contributors to our work?

Thank you Paul for the valuable post.



Here it is...



https://www.linkedin.com/pulse/how-effective-stakeholders-sprint-review-scrum-paul-j-heidema





Tuesday, February 7, 2017

Tech Post: Unit Testing for Distributed Systems


I recently came across this very interesting presentation and article related to Unit Tests for distributed systems and felt it was worth sharing.

Presentation:
https://www.usenix.org/conference/osdi14/technical-sessions/presentation/yuan

Article: 
https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-yuan.pdf


From the technical article...
almost all (92%) of the catastrophic system failures are the result of incorrect handling of non-fatal errors explicitly signaled in software.

and this one...

in 58% of the catastrophic failures, the underlying faults could easily have been detected through simple testing of error handling code.

This sentence really caught my attention...
In fact, in 35% of the catastrophic failures, the faults in the error handling code fall into three trivial patterns: (i) the error handler is simply empty or only contains a log printing statement, (ii) the error handler aborts the cluster on an overly-general exception, and (iii) the error handler contains expressions like “FIXME” or “TODO” in the comments.

If you are working on distributed systems and are wondering where to put effort in automated testing, it might be worth grabbing a coffee and spending some time on this.

At a minimum, it might have you think about just scanning your code for the word FIXME or TODO in catch blocks and put an end to that !

There's more that I could add, but it's probably best if you just read this article for yourself.

https://www.usenix.org/conference/osdi14/technical-sessions/presentation/yuan

Enjoy


Saturday, December 17, 2016

What is an API for non-technical people

During conversations over the last few weeks, I have been asked more than once by non-technical people... "What is an API ?"

As technology evolves to have more interconnected systems, many will find the term API becoming a larger part of their lives. This includes both business people and technical people alike.

I thought I would share an analogy to help those that are less technical understand some simple concepts to get you started on your own journey of learning, or just to help you feel better when talking with someone about the topic :->

If you are interested in incremental, multi-team delivery, you will find these concepts important to understand.  

There are other concepts and the following is definitely an over-simplification, but this should get you started.  My goal is not to make you an expert, but to get you ready to have some conversations about this in your environment.






  • Public interface
  • Keep the implementation private
  • Don't make something public unless you really have to

For the purpose of this post, consider an Interface to be .... 

The means by which you interact with something to send and/or receive information or change of state.

An API is an Application Program Interface.  To extend the definition, consider an API to be...  

The means by which you interact with a Program or System to send and/or receive information or change of state.


For the technical people out there.. Yes, I know this is an oversimplification.




The Automobile gas pedal.

Public Interface - To interface is simple. One moves the pedal forward to increase the flow of gas to the engine.  Decrease the distance forward and the opposite happens.

The results are known to us either a - vehicle going faster, or b - engine revving faster, c - vehicle going slower, d - engine revving slower.

It is important to understand, there is no interface through the gas pedal for speed, torque or anything else.  Those are other systems that have their own interfaces.

Keep the implementation private - Imaging for yourself that to know how to push the gas, you needed to know

- How many cyclinders the vehicle has
- Is it gas or electric
- What size engine
- What type of engine computer
- etc.

If you needed these to interact with the gas pedal, it would certainly be more difficult to drive.

Imagine that before you pressed the gas pedal, you had to calculate the force necessary based on engine size, the angle of the hill you were on, figure out which cylinders to turn on in what order to be able to press the gas.

Don't make something public unless you really have to.

These details are referred to as Implementation details. A good interface keeps this information private (not visible to the person or system using it).   

By using this approach, you could for instance, change your engine from one type to another, or get an updated engine computer.  You might have different results as you press the gas pedal with evidence from other interfaces (ie: speedometer), but you didn't need to change anything in the current interface or learn anything new to use it.

Don't make something public unless you really have to - This concept is extremely important for those that are growing systems incrementally or organically. Think about it this way... 

Once you make an interface public in a complex system, it may be almost impossible to remove that interface from future iterations.

This situation will be compounded in systems that take years or generations to change or where interaction from other systems may rely on these interfaces long into the future.

In the example of our car, let's assume that someone said "You know, we could make the gas pedal really awesome if we allowed the operator to press on the right or left of the pedal by angling their feet to give a different acceleration potential".  

This might make sense in your environment (let's say you have a specialised product for race car drivers and that would improve their lives significantly!). Innovation is cool that way.

My goal here is to have you consider this; Would a different interface be appropriate ( a hand control for instance). Could we have a separate control for Normal Acceleration or Faster Acceleration (think about this.. some vehicles have a Sport Mode).

There are two potential considerations should you decide to add this really cool interface Publicly (in your current application).

In a distributed application or system, something or someone will use a public interface! This means that you may need to have that ability in your system forever. You may never take it away (well, not easily at least).

If you realize later that you should have made a different interface (Sport Mode), it will be too late. You have made the interface public.  


  • People may have already created driving schools for the new accelerator system
  • Someone may have created a special shoe to use it
  • Control systems may have been built to control the pedal from a manual control that converts it to suitable pressures on the gas pedal 
  • And so on.
To complete the analogy to software or systems development, now anytime someone wants to use the interface they have to spend time and effort learning and understanding acceleration concepts when all they wanted to do was press the gas pedal.

I hope this helped the non-technical person to be able to have some knowledgeable conversations about Application Program Interfaces or APIs.

There are Interfaces at different levels in your vehicle. Examples include....

- Engine computer
- Fuel pump
- Light switch controller
- Engine itself (think electric)
- Speedometer
- Tachometer

Technically oriented people will be using this term often in the future. APIs and discussions will become more prevalent as technology becomes more pervasive. The Internet of Things (IoT) is mostly about APIs for starters.

Hopefully, I have given you a bit of a non-technical description that helps you out to start your learning journey about this topic.

Actually, even communication and conversation is a form of Interface between humans. That's a topic for another time :->

Enjoy.

Tuesday, December 13, 2016


A simple thought today....

One of the smartest people I ever met showed me that overcoming "fear of loss" would improve my life. 

He was right. 

I love change! :-)




This post is somewhat related to this one.


Sunday, December 11, 2016

Consider the principles of the Agile Manifesto carefully. A thinking tool.


How to read this post: 

Read the word or grouping of words.

When you see the "........." Stop, pause and think about the word or phrase that just passed. 

Imagine for yourself what it would take to have a Culture  that supports this one thought or idea. 

Try not to rush the evaluation of each phrase as it relates to your environment.

Then, move to the next word or group of words. 

At the end of the exercise, you will find the specific agile principle considered.

You could repeat this thought process for any of the principles of the Agile Manifesto.



I will use one of the principles as an example...




Build projects around   


.............


motivated individuals


.............

Give them the environment   


.............


Give

.............


support they need


.............


they need



.............


trust them


.............


trust them to get the job done


.............


trust them


.............


job done


.............


done


.............


"Build projects around motivated individuals. 
Give them the environment and support they need, 
and trust them to get the job done."

Imagine for yourself that your organizational culture has changed to the point where these principles are embraced and part of everyday life.

I know from experience that an in-depth discussion on any of these words or phrases and their meanings and implication could be enlightening for many.

The rest is up to you.




References:

Agile Manifesto - http://agilemanifesto.org/

Agile Maniesto Principles - http://agilemanifesto.org/principles.html







Tuesday, December 6, 2016

Techpost - The Test-Maintain Loop saved me during a Jenkins Server infrastructure upgrade

Technical Post (just a warning for the non-technical follower).

As some of you know, earlier this year I created a first version of an Ansible Playbook Test framework and put it on Ansible Galaxy.  


Yesterday, this tool saved me from creating a disaster on my Production Jenkins CI instance on AWS (Amazon Web Services). 

I'd like to share that story as an example for others of what is possible if you include testing as part of your Infrastructure as Code strategy.

How I was saved yesterday:

(some details removed for security and simplicity reasons).

Execute the Playbook...
ansible-playbook -i Inventory/CASPAR/staging/ CASPAR_setup_jenkinsservers.yml -u ubuntu --ask-vault-pass --ask-sudo-pass
The result is a new AWS instance server with Java 8 loaded,  appropriate users and groups setup and a base Jenkins Image loaded.  The key here is SETUP (the minimum needed to get a server into MAINTAIN status. The server is in "Staging".

Then, the repetitive playbook is executed (this one runs on a regular basis to keep servers up-to-date in all environments. In my case, I execute for Dev/Staging and Production ( the same playbook is used and applied to all 3 environments ).

ansible-playbook -i Inventory/CASPAR/staging/ CASPAR_maintain_jenkinsservers.yml -u ubuntu --ask-vault-pass --ask-sudo-pass

Note: The only difference in the name is the word "maintain". 

Note: To run the same playbook in Prod or Dev, I simply run the same playbook against /CASPAR/prod/  ( Ansible's dynamic Inventory auto-finds the appropriate machines based on tags )

Then, while the machine is still in Staging, the following test playbook is executed....
ansible-playbook -i Inventory/CASPAR/staging/ CASPAR_test_jenkinsservers.yml -u ubuntu --ask-vault-pass --ask-sudo-pass

The "test" playbook executes all predefined tests to ensure the server is in good shape. 
If there are no errors, all that needs to happen is for the machine to be re-tagged in AWS from Staging to Prod and then the next time the maintain playbook is executed, it will have any appropriate changes using the SAME playbook as before (different Gateway addresses, different database connection string, etc).

Then of course, the TEST playbook is executed again (one final test).. 

The Test playbook can now also serve as a Governance check playbook as well and could be executed by the same team or externally where needed. It provides a means for safer, more comfortable changes, while also providing a built-in governance component if needed.

Yesterday, when I ran the tests in Staging I received an error about missing packages.


ansible-playbook -i Inventory/CASPAR/staging/ CASPAR_test_jenkinsservers.yml -u ubuntu --ask-vault-pass --ask-sudo-pass > test.log
grep "TEST_PASSED" test.log
grep "TEST_FAILED" test.log

I received the message...

 "msg": "TEST_FAILED: package xxxxxx expected present "

(xxxxxx is a hidden package only for this post for security reasons).

If I had converted the host to Production, it would have caused big problems in my production environment.

After doing some research, If found that I had previously requested a newer version of an AMI ( an Amazon Machine Image ). 

Although the entire "setup" and "maintain" playbooks ran flawlessly (with no errors), what I did not know was the newer AMI was missing a critical Operating System package that my environment needed.

I modified the "maintain" playbook to include the missing package, re-ran the "maintain" playbook and then re-ran the Test Playbook.  Everything passed. Now, I know the Staging and Prod machines will always be up-to-date with this package when the "maintain" playbook runs it's continuous loop.

The new Jenkins Server was tagged as "Prod" and then the previous server deleted from AWS. The transition was painless.

By taking this approach and adding new checks to my server first as they become evident, I ensure that I will  not deploy something to production that has not already been determined to be a potential problem. 

I will no longer have this issue or one related to missing this package again.  If an image contains the missing image, no problem.. It will simply pass. Ansible does not re-install packages if they already exist (unless "latest" is specified in the version").



Brief History of the Test/Maintain/Govern Loop

The purpose of creating the Test/Maintain/Govern Loop for Playbooks was to show a Test-First approach to infrastructure delivery to make the transition to Infrastructure as code easier to get accustomed to.

The approach uses knowledge taken from years of insight from the software development world in delivery of complicated environments and applies it to the Infrastructure as Code domain.  


Technical Notes:

Jenkins CI server running in Production on AWS.  

Ansible Playbook uses to Setup/Maintain and Test server(s) in both Staging, and Production. ((how my environment works for build servers.. TODAY).

In AWS, tags are used to determine if a machine is "in production" or "in staging". They are both live in AWS in the same VPC (A VPC is like a private IP range within AWS for my hosts to reach each other).

Playbooks are formatted into YAML (a markup format) to have Dev/Staging/Prod in the same playbook.

A unique matching approach allows the same Playbook to run many times in Dev and Staging. This helps to ensure that when the Playbook runs on the Production machine, it has already executed many times already (and confirmed correct).

An often missing catch with playbooks is that "If" statements can be used to determine of parts of playbooks are executed.  A playbook command can be set to only run a certain instruction only IF a certain environment exists (an example).

When I want to upgrade my Jenkins server or reconfigure a new one, I take an approach of.. "Build a new one, run the setup/maintain and test on it, and IF everything is OK, move it into the Production Tag and then disable the older server. This allows me to ensure all is well before activating a new production change.  

Think of the saying ....  

"All Servers Are Temporary"



If you feel that your organization could benefit from learning about a Test First approach to Infrastructure, please feel free to reach out to me.  I provide 1/2 day or full day sessions in the Toronto area or full-day sessions plus expenses anywhere else worldwide. 

A link to the original presentation be found here..... 



If you are so inclined, here's a link to the root repository... 

A sample "test" (also used for governance) playbook is located here...    
https://github.com/MikeCaspar/playbook_test_framework/blob/master/sample/applicationX_proxy_servers_test.yml


Monday, November 28, 2016

Change can be fun or exciting


Over the last few weeks, I have seen a repeating theme in my social media feeds.

That theme... "Change is Hard" 

In my lifetime, I have been part of change, both positive and negative. 

Some of that change was imposed on me from above, and some from market forces. In some cases, personal interactions create change. My reaction to these different situations is very different for sure.

If you are a person who helps others to embrace or live through change (whatever your interpretation of change is)....

... consider the damage you are causing by inspiring fear where it simply may not be appropriate or necessary.


I can say from both personal and professional experience...


Change does not have to be hard.

It can be fun or exciting!


Please stop giving the impression that hard change is mandatory.