Monday, April 1, 2019

Back to Basics - Data

Two things happened recently that brought me to the point of writing this post …

  1. I was meeting with a candidate for an open role inside my organization and the individual began discussing how he tackles design and development. My ears perked up when I heard him mention that he creates a data map to understand what needs to happen to individual pieces of information as they move thru a proposed solution. Only when he understands the data does he feel he can make a judgement on whether the proposed solution will work or feel that he can properly define a solution if one doesn’t exist.
  2. I’ve been playing with a small code concept with a couple of buddies outside of work – we are all doing this just to explore some concepts and because at heart we still like to geek out. Amazing what happens when you're a coder, but you can’t touch the keys during your day job! Anyhow, we were having a chat about our little app concept and immediately all 3 of us began tracking the data elements we would need and marrying these data elements up to potential screen flows. The discussion always came back to the data, what data would be available at any given point and what manipulation of the data would need to take place between different visual aspects that the user would have access to in the app.

As many of you have read in previous columns, I’m a firm believer in understanding data. In my opinion, if you don’t understand the most discrete data elements being used by an application – where they came from, how they are being manipulated, what the contract is between functional elements of code and how they are presented to the user – you will never have a true grasp of the problem you are attempting to solve. And if you don’ t understand the problem, how do you test for and claim success! So how do we, as professionals, ensure that we understand these discreet element, how can we use that information to harden our applications and ensure that we are producing rock solid systems?

Data elements can be broken into 2 distinct types that have meaning to a developer – stored (permanent) data elements and calculated (transient) data elements. Stored data elements are those pieces of information that are physically stored somewhere that the program can read from at any time and can write to as needed to update the value. Calculated data elements are transient, at some point in the program the values are identified (calculated), used and then discarded.

Stored Data Elements (Examples)

  1. Application Configuration Information: Service API Locations, Authentication Keys, Application Rules
  2. Multi-Tenant Configuration Information: Images, Application Rule Overrides (Customizations), Colors
  3. User Configuration Information: Colors, Alerts
  4. Data Retained in a Database: Customer Tables, Invoice Tables, Inventory Tables, Indexes, Relationships, Views

Calculated Data Elements (examples)

  1. Days until loan payment due – calculated from stored Date Due
  2. Forcing a string value to upper case for comparison and/or display – uses stored values but does not need to be stored itself
  3. Calculating the payoff amount of an open loan if paid today – uses stored values but does not need to be stored anywhere

As a developer, it is our role to understand how each distinct data element is received, what manipulation needs to take place before we display the value, store the value back, pass the data on to the next function or used in another calculation. You need to understand the boundaries of what is acceptable and you need to plan for exceptions when the data does not fit within the defined boundaries, does not exist or causes the system to respond in an unplanned way.

For my purposes, I take data mapping beyond the typical definition used in development. Most people will refer to data mapping as the exercise needed to understand the underlying data schema that an application utilizes – being very simplistic the database(s), table(s) and field(s) needed by the application. I have always extended this definition. To me data mapping includes the following information:

Data Schema – the database(s), table(s) and column(s)/field(s) touched by the application. It will include key information needed by the technical team – architects, leads, developers, quality analysts and test automation:

  1. Database configuration information: Server Location, Port Information, Access Authentication Information
  2. Table configuration information: Table Information, Relationship Information, Index Information, Column(s)/Field(s) Information (including constraints, defaults, boundaries, lists)
  3. Views needed to support the display and manipulation of data within the application

Application Configuration Information – this is the stuff that allows the application to ‘prep’ itself, make all the connections it needs to various databases, services, file systems so that it can run. Typically this information is stored in the file systems – Config, Parameter, JSON, YAML, XML, etc.

  1. Where is the email server and the associated connection information that you need so your app can send emails? Server names, ip address, authentication information.
  2. Where is the message service and the associated connection information that you need so your app can send SMS messages? Service address, authentication information.
  3. Where is the logging service and the associated connection information that you need so that your app can log errors? Server names, ip address, authentication information.
  4. What are the default rules used by the application? Sometimes instead of hardcoding logic of how a piece of data will be manipulated, you define rules that the application will read and then use to process the data.
  5. What are the internal server addresses and connection information that the application needs to provide all planned functionality. Server names, ip address, authentication information.

Multi-Tenant Configuration Information – this is the stuff that allows the application to understand what ‘organization’ the user is accessing so that data is segmented properly and that the correct UX Schema and Functionality is made available. This information can be stored in the database or alternatively can be stored in the file systems – Config, Parameter, JSON, YAML, XML, etc. My personal preference is to store this stuff in the file system, not in the database, but each developer has their own feel for how they will store and manipulate this data.

  1. What are the default color schemes used in the UI? If a ‘tenant’ wants to override the colors, what colors do they want?
  2. Where are the default images used in the UI? If a ‘tenant’ wants to override them with their own images, where are they located?
  3. Where are the tenant application logging files going to be kept?
  4. Is there going to be a difference in what gets logged for specific ‘tenants’
  5. Can an administrator turn on/off functionality for a specific ‘tenant’ – if so, this needs to get recorded somewhere so the system knows how to behave.
  6. What are the overrides for the Application Configuration Information? File Store changes, authentication keys, database specific configs, default rule overrides.

User Specific Configuration Information – this is the stuff that allow us to specifically configure the user experience, menu items, colors. Most people keep this stuff in the database, but it is possible to keep this in the file system – just remember, you need to be able to scale for the number of users you expect to access the system.

  1. Can the user override color schemes?
  2. Does the system allow the user or an administrator to customize the menu options available to the user?

Calculated Elements

  1. Need to understand where in the app that calculated values will be used.
  2. Document the calculated values that the system will need.
  3. Example: Current Loan Payoff Value – the system stores the following information: Current Loan Outstanding Amount, Loan Interest Rate. The system will calculate the following information: Daily Interest, Number of Days Interest In Current Period, Current Interest Due, Current Loan Payoff Amount
  4. Example: Average Sales for Current Month – the system stores the following information: Date of Invoice, Invoice Total Amount Due. The system will calculate the following information: Sum of Invoice Total Amount Due, Count of Invoices For Current Month, Average Sale for Current Month

What I’ve discussed above is at the 50k foot view. Specifically on the database design stuff – as you begin to utilize apps out there that allow you to describe your databases, tables, columns, relationships – you’ll see that there are a lot of additional descriptive elements that these programs allow you to document. You’ll need to decide how much of that functionality you take advantage of, what value it brings to the team and for future support purposes. Remember, think thru how someone will maintain your design and code 18 months from now and what appropriate information that they will need to make their lives easier.

As you are sketching out your thoughts on your application, this is the stuff that will ensure that you get it right. That you understand what is happening as the data moves thru your system and then how you can test the application to ensure that you’re handling the data properly. If you are the total team – then you can keep track of this stuff with open source tools and/or spreadsheets. If you’re working with a team – get agreement upfront on the tools you’re going to use to manage and share this information – whether they are proprietary tools or open source tools.

All of the above are living elements that need to be continuously updated throughout the life of the development cycle and the application lifecycle. As you actually get into the application design and layout the specific logic flows associated with user interactions, you will most likely end up changing information at all of the layers discussed – modifying and changing tables/columns as well as potentially adding new tables/columns. Your configuration specific data and rules data will also change – keep the documentation up to date so that when you have to go back and support it months or years after it was originally developed, you’ll understand what all of the discreet data elements are, why they are there, how they are used, where they are used and what calculations are used to manipulate the data.

As you work thru identifying the above information, you’ll naturally begin to identify the class definitions you’ll need and a skeleton of the properties and methods that will make up the class. Document as much of this as possible up front, it will save you countless hours as you not only create the app initially, but more importantly as you have to come  back and maintain the app.

If you'd like more information on my background: LinkedIn Profile