During the workday, laboratory personnel generate data and information. That intellectual material forms the basis of decisions about product quality, research directions, someone’s health, and other matters. Data and information must be entered into reports, computer systems, databases, and software packages to be useful. While that seems like a trivial task, it isn’t, not because typing a few keystrokes is complicated but because the problems resulting from errors can be severe. Did you ever get an insurance claim rejected because someone entered a code or other information incorrectly? Mistakes that affect our daily lives occur in credit card databases, Federal Bureau of Investigation (FBI) records, and impact Medicaid databases. As such, this article aims to explore the methods and ramifications of data entry issues in the execution of laboratory work.
Likelihood and cost of data entry errors
Spreadsheets are standard tools in lab work. The error rate of data entry can be high. For example, the probability of inaccurate data entry is 18 to 40 percent for a simple spreadsheet and 100 percent for a complex spreadsheet. There are two types of data entry errors: transcription and transposition. A transcription error results from incorrectly reading the values or letters to be input, whereas a transposition error results from interchanging the positions of correct letters or values. Transcription errors are not limited to human data entry but can occur when using optical character recognition software.
In the early 1990s, George Labovitz and Yu Sang Chang created the 1-10-100 rule, which states that data entry errors multiply costs exponentially according to the stage at which they are identified and corrected. If it costs you a dollar to fix a data entry error as soon as it’s made, it will cost $10 at the next step of the process, perhaps when it is used as part of a calculation. If the error persists and is reported as part of an analytical sample report, it may cost $100 to fix, plus the embarrassment caused by the error. Those dollar figures are in 1992 valuations; in today’s dollars (2023), $100 then equals $214.43 today.
The 1-10-100 rule can also have an impact on data integrity. In 2020, the Food and Drug Administration’s (FDA’s) inspection program frequently identified data integrity violations, including:
- Deleted or manipulated data,
- Aborted sample analysis without justification,
- Invalidated out-of-specification (OOS) results without justification,
- Destroyed or lost data,
- Non-contemporaneous work documentation, and
- Uncontrolled documentation.
Avoiding data entry errors
Data entry errors and modifications contribute to data integrity issues. Earlier, we noted two common sources of errors; data transcription and data transposition. Other examples include:
- Inaccurate data inputs (copying data from an instrument to a notebook and then entering it into a form; three opportunities for error on the same data element);
- Errors in data formatting (copying data into the wrong field on a spreadsheet); and
- Unit inconsistencies (i.e., measurement units, or mistakes in date and time formats, with some spreadsheets reformatting date representations if left to default formats, causing a misread of the value).
Data misinterpretation (confusing letter “O” for a zero) can also be a problem. When printing data, it should be presented in fonts that clearly distinguish between letters and numbers that might be misread, e.g., by using a slashed zero. Typefaces commonly found on personal computers that use the slashed zero include:
- Terminal, in Microsoft’s Windows line;
- Consolas, in Microsoft’s Windows Vista, Windows 7, Microsoft Office 2007, and Microsoft Visual Studio 2010;
- Menlo, in macOS, for example, 1234567890;
- Monaco, in macOS, for example, 1234567890;
- SF Mono, in macOS;
- Liberation (variant), in the Fedora Linux distribution, though not present in other Linux distributions;
- ProFont; and
- Roboto Mono.
Other combinations that can be confused are the numbers “1” and “7” along with the letter “l”, either in the upper or lower case depending on the font. When software is designed for laboratory use, numerical and alphanumerical data fields should use appropriately clear fonts to minimize confusion.
Guidelines from federal agencies and other organizations recommend a second individual review and sign-off on any manually entered data. This may seem expensive and cumbersome to implement, but given the cost of errors, timely review and correction can save money, effort, and embarrassment. It should be considered a normal part of the work routine. Computer applications receiving data (e.g., LIMS, spreadsheets, etc.) should use color-coded cells to help identify data elements that have or have not been reviewed.
In addition, you should ensure that you have enough staff to do the work. Lab personnel who are overworked and stressed can lead to errors that are missed; the work environment can make a significant difference in performance and results. Data review should be considered a normal part of the work routine and not just one more thing to be added to a loaded schedule. Laboratory personnel put the work into performing tests and experiments. The final step of having a second set of eyes review the data entries for accuracy is a small add-on effort to prevent that work from being wasted by typing errors.
Be sure lab personnel are appropriately trained and know what errors to look for. In addition, efforts should be made to continually improve the data entry process. Considering the time people spend on administrative work and data entry, this is no small contribution to improving overall performance.
Avoiding data entry errors by using laboratory informatics systems
There are methods of avoiding data entry errors, and we’ll look at three of them: drop-down or pull-down menus, barcodes/RFID, and electronic data capture.
Drop-down or pull-down menus
Common to most applications, websites, and spreadsheets, menus are activated by clicking or hovering over an element on the screen. A menu of choices, and often sub-menus, is revealed. Click on an item, and a data element enters a field, or an action occurs. They are commonly used when data must be entered (e.g., the name of a state or a type of organism), and the list of possible responses is known and limited. As a means of entering data into a field, menus of this sort can be helpful for a number of reasons, including:
- All the choices are shown, making it a matter of simply picking the right one;
- It avoids misspellings and encourages normalized formatting; and,
- It limits the choices of responses for a data field.
When a drop-down menu is constructed using HTML on a website, for example, the list elements contain two parts: the text displayed in the menu and the data element entered in the field. If we were entering the name of a state, “Massachusetts” for example, that word would be shown in the menu, and the data element could be entered as “MA”. For programmers, that is important because the field size is limited to two characters (all states have two-character abbreviations), and they don’t have to check to see if the response is one of a set of correct responses. If the data field were open to direct typing, the responses could include “Massachusetts,” “Mass.,” “MA,” “Ma,”, or a misspelling. If you, as a user, were searching for samples that came from that state, you’d have to search for those variations; with the two-character substitution, all you need to search for is “MA.”
The entry has to be checked to ensure the correct choice was made during data entry.
Barcodes and radio frequency ID chips (RFID)
Almost everything has a barcode on it or on the item’s packaging. There’s a good reason: barcodes are an easy-to-use data entry method that is fast, error-free, and avoids typing. In addition, they are easy to produce on a high-quality printer or graphics display. Preparing a sample label with a barcode is a standard part of most laboratory information management systems (LIMS). It can include as much or as little information as needed within a sample container’s label space.
Barcode sheets, letter-sized printouts of barcodes unique to your laboratory, and the text they represent can be used as a physical menu, avoiding typing where programmed drop-down menus aren’t available. Of course, entries from these sheets need to be checked to make sure that the proper selection has been made.
There are a variety of barcode technologies available depending on your needs. The most common are one-dimensional (1D) or linear barcodes. These can hold up to 85 characters, providing useful information. For example, the initial sample submitted for testing would contain the sample ID. If aliquots are taken, they can include the parent sample ID with an extension for that aliquot. Two-dimensional barcodes (2D) can hold up to 7089 characters and include sample IDs, submitter contact information, safety concerns, and storage location if special handling is needed (e.g., biobanking applications).
In addition to sample IDs, barcodes can be useful in asset tracking and inventory management applications.
While barcodes can be very useful, we need to consider another point: what they are printed on. While most applications would be satisfied with paper stick-on labels, some applications require special attention. For example, suppose samples are kept in special environmental conditions such as freezers, exposure to high humidity, or warmer than normal sample storage. In that case, attention has to be paid to the substrate the labels are made from, the method of adhesion, and the printing technology so that the labels remain on the containers and are readable.
In some cases, such as hazardous environments or locations that need to avoid frequent handling, such as samples in freezers, radio frequency identification chips (RFID) might be a better fit. These are commonly used in highway toll-taking as you drive under a reader. RFIDs are non-contact readers that don’t need line-of-sight access to work, as barcodes do. That means you can inventory a biobanking freezer quickly or a large sample or materials storage facility. The downside is that the chips can be costly.
Electronic data capture
Another approach is to avoid manual data entry entirely. It relies on the instrument or instrument data systems (IDS) directly connecting to a data storage system like a LIMS. Consider two possibilities, a pH meter and an instrument (e.g., spectrophotometer, chromatograph, etc.).
The pH meter has two operating modes. The front-panel connection is used in manual mode, and the rear-panel connections can be connected to a computer; they can include ports for serial ASCII, USB, digital I/O, and ethernet, as well as Bluetooth and wi-fi connections. In manual (front panel) mode, the analyst controls everything, and the results of the measurements are displayed on a numeric display. The alternative is to use programmed access to the device using the rear-panel or wireless connections.
Using rear-panel connections to a computer allows the analyst to have a program running through a LIMS or a standalone software application to control the device electronically. One example of this is an automated titrator controlled by a microprocessor. The computer-instrument combination would require either a person managing the physical operations of placing samples and standards in the device or doing it under robotic control. In either case, the computer reads the measurements by sending a command to the device and receiving the response. It would then perform any necessary calculations and then prepare a table of samples, calculations, and results that can automatically be entered into a database such as LIMS. This eliminates manual data entry and provides a basis for automation and increased productivity.
An IDS works similarly, with computers taking on more of the work. At the analyst’s request, a LIMS would prepare a worklist of samples to be processed and send it to the IDS that controls the instrument and associated autosampler. The autosampler would likely have a barcode reader to verify the sample being processed. The computer attached to the instrument would process the results, format a table of samples and results and send it to the LIMS for automated data entry. Again, this means no typing. The benefit is higher productivity, more efficient workflow, and lower cost of operations compared to manual method execution.
While data entry is a normal part of laboratory work, how you do it matters in terms of overall operational performance. Labs may begin with manual data entry, but they will quickly discover that the cost of that process is going to be excessive in the long run. This is another example where automation using laboratory informatics systems like an electronic laboratory notebook (ELN) or a LIMS like LabLynx’s ELab LIMS can streamline lab operations, reduce overall operating costs, and better use people’s skills and capabilities.