How can science majors build a profession in Lab Informatics?

How can science majors build a profession in Lab Informatics? | LabLynx Resources

An undergraduate science education aims to teach people what they need to know to pursue a particular scientific discipline; it emphasizes foundational elements of the discipline. In most cases in current science education, the time allotted to teaching a scientific discipline is often insufficient to address the existing and growing knowledge base and multidisciplinary aspects of executing laboratory work in both industrial and academic settings; the focus is primarily on educational topics. Yet employers in science-based industries want to hire people “ready to work,” leaving a significant gap between the goals of science education and the background needed to be productive in the workplace.

The purpose of this guide is to provide a student with a look at the laboratory informatics landscape in industrial labs. It should accomplish two goals:

  • Providing a framework to help the reader understand what they need to know to be both comfortable and effective in an industrial setting, and, 
  • Give an instructor an outline of a survey course should they want to pursue teaching the material.

This article intends to be an annotated map of this portion of the technological world, identifying critical points of interest, how they relate, and some recommendations to learn more. It intends that in one document, you can appreciate the technologies and understand the higher-level positioning and function if you hear their names. 

Topics that will be covered:

  • Science vs. lab operations, models, and processes
  • Levels of knowledge
    • Competent user
    • Administrator
    • Support (Laboratory Systems Engineers[1], IT specialists)
    • Developer
  • Laboratory Informatics tools and technologies
    • Laboratory information management systems (LIMS)
    • Electronic laboratory notebooks (ELN)
    • Scientific data management systems (SDMS)
    • Laboratory execution systems (LES)
    • Instrument data systems (IDS)
    • Robotics
    • Data acquisition and control
    • Data governance and Data Integrity
    • Systems development, tinkering vs. engineering.
  • Regulatory aspects
    • Sources of regulations
    • Why validation is essential.
  • Artificial intelligence (AI)
  • Industrial examples

Note: this document will refer to “,” an educational forum for laboratory informatics that will ask you to sign in to access its contents. There is no charge for accessing or using any of the materials; the login is for security purposes. The sign-in can be done with various existing social media accounts, or you can create a new one.


The science you’ve learned in school provides a basis for understanding laboratory methods, solving problems, conducting projects/research, and developing methods. It has little to do with the orchestration of industrial lab operations. Paper-based procedures first filled that role and are now in laboratory informatics, like management systems.
Why is your laboratory experience in your formal education different from that in industrial labs? Both include research, testing, chemistry, biotechnology, pharmaceutical, material development, engineering, etc.

Educational lab work is about understanding principles and techniques, developing skills by executing procedures, and conducting research projects. Industrial work is about producing data, information, materials, and devices, some supporting research, others supporting production/manufacturing operations. Those products are subject to both regulatory review and are subject to internal guidelines. If a regulatory inspector finds fault with a lab’s data, the consequences can range from more detailed inspection to closing the lab or the entire production facility until remedial actions are enacted.

Because of the importance of data integrity and quality, industrial laboratories operate under regulations from a variety of sources, including corporate guidelines, the Food and Drug Administration (FDA), the Environmental Protection Agency (EPA), the International Organization for Standardization (ISO), and others. These regulatory efforts aim to ensure that the data used to make decisions about product quality is defensible and stands up to challenges. While regulatory requirements and corporate guidelines strongly motivate a well-organized and managed laboratory, they aren’t the only important factors. It all comes down to product quality and safety.

Consider a lawsuit brought by a consumer about product quality. Suppose the company can’t demonstrate that the data supporting product quality is on solid ground. In that case, they may be fined with significant damages.

As noted, labs are production operations. There is more leeway in research, but service labs (analytical, physical properties, quality control, contract testing, etc.) are heavily production-oriented, so some refer to the work as “scientific manufacturing/production” because of the heavy reliance on automation. That dependence on automation has led to adopting systems such as LIMS, ELN, SDMS, IDS, and robotics to organize and manage the work and produce results. Some aspects of research, where large volumes of sample processing are essential, have the same issues.

Operational models for research and service laboratories

Before we get too deeply into the material, we need to describe the setting where laboratory informatics tools are used. Organizational processes describe how a business works and how the various departments relate. Laboratory work is process-driven at several levels; one may explain how the lab functions and carries out its intended purpose, and others detail how experimental procedures are carried out. Some of these processes are the same across organizations in differing industries, accounting for example. Others depend on the industry and are the basis for requiring industry experience before hiring people at mid- and upper levels. Still others rely on the particular mission of a lab within an organization; research is a key example. Research labs’ internal methods of operation can vary widely.
We’ll begin by looking at the working environment. Figure 1 shows the functions we need to consider. That model is based on the author’s experience; however, it fits many applied research groups in different industries whose work is intended to lead to new/improved products, and the functional behavior will be similar.

An organizational structure of an R&D group

Figure 1 – An organizational structure of an R&D group


The labs I worked in supported R&D in polymers and pharmaceuticals – the overarching company had broad interests. The research labs (left column of Figure 1) were focused on those projects. The other facilities consisted of:

  • Analytical research – this lab had four functions – routine chemical analysis in support of the research labs, new method development to support both research and production QC, non-routine analytical work to address special projects, and monitoring testing accuracy of the production QC labs (several production facilities were making different products).
  • Physical Properties Research – similar in function to the analytical lab but emphasizing measuring the physical properties of polymers instead of chemical analysis.
  • Material Characterization – this group worked with research and special projects looking at the composition of polymers and their properties, such as rheology, molecular weight distribution, and other characteristics.
  • The fabrication facility processed experimental polymers into blends, films, and other components that could be further tested in the physical properties lab.

Once an experimental material reached a stage where it was ready for scale-up development, it entered the pilot plant, where production processes were designed and tested to see if the material could be made in larger quantities and still retain its desirable properties. A dedicated testing lab supported the pilot plant to do raw materials, in-process, and post-production testing. If a product met its goals, it was moved to a production facility for larger-scale testing and, eventually, commercial production.

Intra-lab workflows

Let’s look at each of these support lab groups more closely and examine how their workflows relate.

Analytical research

The workflows in this lab fall into two categories: routine testing (i.e., the service lab model) and research. In the routine testing portion, samples can come from the research labs, production facilities, and the pilot plant testing lab. The research work can come from salespeople (e.g., “We found this in a sample of a competitive product, what is it?”, “Our customer asked us to analyze this,” etc.), customer support trying to solve customer issues, and researchers developing test methods to support research. The methods used for analysis can come from various sources depending on the industry, e.g., ASTM.[2], journals, vendors, and intra-organizational sources.

Physical properties research

The work here is predominately routine testing (e.g., the service lab model). Although samples can come from various sources, as with analytical research, the test methods are standardized and come from groups like the ASTM and, in some cases, the customers of the company’s products. Standardized procedures compare results to testing by other organizations, including potential customers. Labs like this will be found in various industries, including pharmaceuticals, where the lab might be responsible for tablet uniformity testing, among other things.

Materials characterization

As noted, this lab performed work that fell between the analytical and physical properties labs. While their test protocols were standardized within the labs, the nature of the materials they worked on involved individual considerations on how the analysis should be approached and the results interpreted. At one level, they were a service lab and followed that behavior. On another, testing execution required more than “just another sample” thinking.


The fabrication facility processed materials from various sources: evaluation samples from both the production facility and pilot plant, competitive material evaluation, and the research labs. They also did parts fabrication for testing in the physical properties lab. Some physical tests require plastic materials formed into unique shapes; for example, tensile bars are used for tensile strength testing (test bars are stretched to see how they deform and eventually fail). The sample sizes they worked with ranged from a few pounds to thousands of pounds (e.g., film production).

The pilot plant testing lab did evaluations on scaled-up processing materials. They had to be located within the pilot plant for fast turn-around testing, including on-demand work and routine analysis. They also serviced process chromatographs for in-line testing. Their test procedures came from the chemical and physical labs, as they were responsible for various tests on small samples; anything larger was sent to the analytical research labs. The pilot plant testing lab follows a service lab model.

On the service lab model and research in general

The service lab model has been noted several times and is common in most industries. The details of sample types and testing will vary, but the operational behavior will be the same:

  • Samples are submitted for testing. In many labs, this is done on paper forms listing sample type, testing to be done, whom to bill, and a description of the sample and any unique concerns or issues. In labs with a LIMS, this can be done online by lab personnel or the sample submitter.
  • The work is logged in (with LIMS electronically and paper systems manually), and rush samples are brought to management’s attention. 
  • Analysts generate worklists and perform the required analysis, and results are recorded in the LIMS or lab notebooks.
  • The work is reviewed and approved for release and, in paper systems, recorded on the submission forms.
  • Reports are sent to whoever submitted the work electronically or via the method the submitter requested

Work from non-routine samples may be logged in under “special projects,” though it may create the need for additional testing.

There is no similar model for research work besides project descriptions, initial project outlines, etc. The nature of the work will change as the project progresses and more is learned. Recording results, observations, plans, etc., requires a flexible medium capable of maintaining notes, printouts, charts, and other forms of information. As a result, ELNs are modular systems consisting of a central application that can link to various functional modules such as graphics, statistics, molecular drawing, reaction databases, user-define database structures to hold experimental data, etc. For additional details, see “The Application of Informatics to Scientific Work: Laboratory Informatics for Newbies” in “LIMSjournal – Laboratory Technology Special Edition[3]”.

Tiered structure of laboratory informatics

Figure 2 – Tiered structure of laboratory informatics


The role of laboratory informatics

Laboratory informatics has several tiers of systems that can be applied to lab work to make it more effective and efficient.

  • The top tier consists of electronic laboratory notebooks (ELNs) and laboratory information management systems (LIMS)[4]
  • Supporting those are scientific data management systems (SDMS)
  • Data and information generation systems such as Laboratory execution systems (LES) and instrument data systems (IDS) form the next tier. IDSs are a combination of instruments and computer systems. Typical examples are chromatographic data systems (CDS) connected to one or more chromatographs, a mass spectrometer connected to a dedicated computer, and almost any major instrument in an instrument-computer combination. CDSs are, at this point, unique in their ability to support multiple instruments.
  • Sharing that third tier are devices like pH meters, balances, and other devices with no associated databases, and must be programmed to be used with upper-tier systems. Their data output can be manually entered into a LIMS, ELN, or LES, but in regulated labs, the input has to be verified by a second individual.

Our initial concern will be with the top-tier systems. The primary interaction between a service lab and someone requesting their services is shown in Figure 3.

Sample request/results flow for laboratory informatics

Figure 3 – Sample request/results flow


The research group or other groups submit samples, and the request proceeds through the system described above. The split between the LIMS in Figure 2 illustrates the separation between logging samples in, the analysis process, and using the LIMS as an administrative tool for completing the work request and returning results to the submitter.




Figure 2 shows the classical assignment of laboratory informatics products to labs, LIMS in service labs, and notebooks like ELNs to research labs. That assignment is an oversimplification; the systems have broader usage in both types of laboratory workflows.

In the research labs, work may generate a large amount of testing that has to be done quickly so that the next steps in the experiments can be determined. The demand may be significant enough to swamp the service labs, and they wouldn’t be able to provide the turn-around time needed for research. A LIMS could be added to the research lab’s laboratory informatics tools to perform high-demand testing within those labs. Other, less demanding testing would be submitted to the service labs. The research lab LIMS could be an independent installation or work with the same core system as the service labs. The choice would depend on the locations of the labs, the need for instrument connections, how cooperative they are, and corporate politics.

The analytical research and materials characterization labs in Figure 1 could justify an ELN based on method development research. In addition to providing a means of detailing the research needed to create a new procedure, the ELN would need access to a variety of databases and tools, including chemical reaction modeling, molecular structure representation, published methods retrieval, etc., as well as any corporate research databases that could exist in a large organization.

The fabrication facility could use an ELN to record the progress of any non-routine work being done. Equipment operating conditions, problems encountered and solved, and the results of their processing would be noted. This could be coordinated with the pilot plant or product development group’s work.

Laboratory Informatics tools

Laboratory operations can be divided into two levels:

  • Management (operations, data management, etc.): the laboratory informatics tools at this level consist of LIMS, ELN, and SDMS.
  • Data/information generation: this is where lab procedures are executed; the laboratory informatics tools consist of LES, IDS, and support for automation and devices such as pH meters, scales, etc.

Operations Management

The need: laboratory operations management is responsible for keeping track of everything that goes on, including:

  • Personnel records – people’s qualifications, personnel files (along with human resources), vacation schedules, education/training, etc.
  • Equipment lists and maintenance – scheduled maintenance, repairs, calibration, qualification for use, software upgrades if appropriate.
  • Documentation – all lab documents, reports, guidelines, sample records, problems, method descriptions, contacts with vendors, etc.
  • Sample processing – what samples need work, the scope of the work, results, reports, etc.
  • Inventory of materials and equipment – not only what is on hand but where it, is its age (some materials have a limited valid lifetime, such as prepared reagents; in other cases, materials may have a limited shelf-life), and any special handling instructions such as storage and disposal. 
  • Any special conditions needed for organisms, their maintenance, condition, records, etc.
  • Data governance;
  • Data integrity;
  • Meeting regulatory requirements, avoiding regulatory issues, including preparation for regulatory audits[5].

Many of those terms will be familiar to you, others less so. We’ll cover them below. The primary point of lab operations is to produce reliable and supportable data and information, which will then be used to support production and research operations. People must trust that the analysis was performed correctly and can be relied upon to make decisions about product quality, whether production operations are under control, or the next step in a research program.

Data Governance: Data governance refers to the overall management of the availability, usability, integrity, and security of the data employed in an enterprise. It encompasses processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals. Its primary purpose is to ensure that data serves the organization’s needs and that there’s a framework in place to handle data-related issues.

Data Integrity: Data integrity refers to the accuracy and consistency of data over its lifecycle, focused more on the validity and reliability of data rather than the broader management processes. It ensures that data remains unaltered and uncorrupted unless there’s a specific need for a change. The primary purpose is to ensure that data is recorded exactly as intended and, upon retrieval, remains the same as when it was stored.

While data governance is a broader concept that deals with the overall management and strategy of data handling in an organization, data integrity is specifically about ensuring the accuracy and reliability of the data. Both are essential for making informed decisions based on the data and ensuring that the data can be trusted.

This subject is increasing in its importance. A framework called ALCOA-CCEA[6] has been created to help define and guide work on data integrity. The letters[7] stand for Attributable, Legible, Contemporaneous, Original, Accurate, Complete, Consistent, Enduring, and Available. For additional information, see “Assuring Data Integrity for Life Sciences” [8].

Regulatory Issues: the purpose of regulations on lab operations from a variety of sources is the same: to ensure that lab operations are well-managed and that laboratory data and information are reliable. The regulations cover personnel, their certifications/qualifications, equipment, reagents, and the validity of lab processes to raise a few points. If you are going to use a piece of equipment suitable for the purpose, that instruments have been calibrated and in good working order, and that all lab processes have been validated. That last phrase is an interesting one and one that causes confusion about what it means and who it applies to.

In the late 1970s, when the original FDA guidelines for manufacturing and laboratory practices appeared, there was considerable concern about what “validation” meant and to what it was applied. The best description of the term is “documented proof that something [a process] works. “Validation” is only applied to processes, and equipment used during the process has to be qualified for use. In short, if you are going to carry out a procedure, you have to have documented evidence that the procedure does what it is supposed to do and that the tools used in its execution are appropriate for the need they are to fill. If you are going to generate data and information, do it according to a proven process. This was initially applied to manufacturing and production, including testing and quality control, but has since been more broadly used. There were questions about its application to research since the FDA didn’t have oversight over research labs, but those concerns have largely, in industrial circles, disappeared. There is one exception to the FDA’s formal oversight in research, which has to do with clinical trials that are still part of the R&D process. Product development and production involving human, animal, or food safety, for example, will be subject to regulatory review.

If you produce data and information, do it through a proven methodology; how else can you trust the results?

The sources of regulations and standards include:

  • 21 CFR Part 11[9], created by entities such as the Food and Drug Administration (FDA);
  • Standards, like the ISO 9000 family and ISO/IEC 17025, were created by groups such as the International Organization for Standardization (ISO). The ISO 9000 family is a set of five quality management systems (QMS) standards that help organizations ensure that they meet customer and other stakeholder needs within statutory and regulatory requirements related to a product or service. ISO/IEC 17025[10] general requirements for the competence of testing and calibration laboratories is the primary standard used by testing and calibration laboratories. In most countries, ISO/IEC 17025 is the standard for which most labs must hold accreditation to be deemed technically competent.
  • Guidance on topics like Good Automated Manufacturing Practice (GAMP), created by entities such as the International Society for Pharmaceutical Engineering (ISPE).
  • The Environmental Protection Agency (EPA);
  • The Centers for Medicare & Medicaid Services (CMS) “Standards and Certification: Laboratory Requirements” (42 CFR 493) developed by Centers for Medicare & Medicaid Services (CMS) for Clinical Laboratory Improvement Amendments (CLIA) Laboratories, and
  • Guidelines and accreditation requirements developed by the American Association for Laboratory Accreditation (A2LA).

The difference between standards and regulations is that regulations have the law to support their enforcement. [11] Standards are consensus agreements that companies, industries, and associations use to define “best practices”.  Guidelines are proposed standards of organizational behavior that may eventually become regulations.  The FDA regulations for example, began as guidelines, underwent discussion and modification, and eventually became regulations under 21 CFR Part 11.  For a detailed discussion in one particular area see “What standards and regulations affect a materials testing laboratory?”.[12]

Data governance, data integrity, and regulatory compliance are, to an extent, laboratory cultural issues. Your organization has to instill practices that contribute to meeting the requirements. Laboratory informatics tools can provide the means for executing the tasks to meet those needs. Still, first, it is a personnel consideration.

The options for meeting those needs include four options:

  • Paper-based record systems using forms and laboratory notebooks;
  • Spreadsheet software;
  • Laboratory information management systems (LIMS), and,
  • Electronic laboratory notebooks (ELNs).

The details for these are noted in “The Application of Informatics to Scientific Work: Laboratory Informatics for Newbies” in “LIMSjournal – Laboratory Technology Special Edition[13]” mentioned earlier. We’ll note some key points here.

Systems like LIMS, LIS, ELN, and LES can assist a lab in meeting regulatory requirements, guidelines, and standards, by providing tools to meet enforcement requirements.  For example one common requirement is the need to provide an audit trail for data/information.  The details of audit trails are covered a couple of paragraphs further on.  The four product classes noted have audit trails built in, efforts to build a sample tracking system in a spreadsheet often do not have that capability as it would be difficult to implement.  Beyond that, the original LES as envisioned by Velquest was intended to provide documented proof that a lab procedure was properly executed by qualified personnel, with qualified equipment and reagents, and documentation for all steps followed along with collected data/information.  This again was designed to meet the expectations for a well-run lab.

Paper-based systems

Once paper-based systems were all we had, and they could deal with all of the issues noted as they existed up to the early 1970’s. Then, the demands of lab operations and growing regulatory compliance at the end of that decade developed a need for better tools. Compliance with laboratory practices depended on the labs/organization’s enforcement. Enforcement failures led to the development of a formal, enforced regulatory program.

The structure of lab operations changes due to the increasing availability of electronic data capture; having measurements in electronic form made them easier to work with if you had the systems in place to do so. Paper-based systems didn’t lend themselves to that. You had to write results down on paper; to use them, you had to copy them to other documents or re-enter them into a program. In many cases, it was all people could afford, but that was a false economy as the cost of computer-based systems saved considerable time and effort. Electronic systems also afforded lab personnel a wider range of data analysis options, yielding more comprehensive work.


Pro’s Con’s
  • Low cost;
  • Easily transportable;
  • No power required;
  • Built-in backup copies;
  • High level of flexibility (writing, drawing, charts, etc.), and,
  • Signature block for author/witness.


  • May be difficult to read;
  • Easily damaged, destroyed;
  • No/limited loss prevention;
  • Manual searches;
  • Data/information must be copied to be used;
  • Difficult to integrate with other users data;
  • High cost of use, everything is manual; and,
  • Audit practices are manual and either self or organizationally enforced.


Word processors are one useful step up from paper-based systems but lack some of the flexibility of paper. Another drawback is regulatory/guideline compliance. Since everything can be edited, an external mechanism has to be used to sign and witness entries. One possible workaround is to print off each day’s work and have that signed. That inherits the problems with managing paper and makes audit trails difficult.

An audit trail is a tool for keeping track of changes to data and information. People make mistakes, something changes, and an entry in a notebook or electronic system has to be corrected or updated. In the laboratory environment, you can’t simply erase something or change an entry. Regulatory compliance, organizational guidelines, and data integrity requirements prevent that, which is part of the reason that entries have to be in ink to detect alterations. Changes are made in paper notebooks by lightly crossing out the old data (it still has to be readable), writing the updated information, noting the date and time of the change, why the change was made, and having the new entry signed and witnessed. That process is called an audit trail to ensure that results aren’t improperly altered. Paper-based systems require this voluntarily and should be enforced at the organizational level. Electronic laboratory systems do this as part of their design.

Spreadsheet-based systems

Spreadsheets have benefited many applications that need easy-to-produce calculation and database systems for home, office, and administrative work. The word laboratory doesn’t appear in that sentence because of the demands for data integrity in laboratory applications. Spreadsheet applications are easy to produce for calculations, graphs, etc. Their ease of use and openness to undocumented modifications make them a poor choice for routine calculations. They’re attractive because of their simplicity of use and power but inappropriate for their lack of controls over editing, lack of audit trails, etc. If you need a routine calculation package, it should be done according to standard software development guidelines. One major drawback is the difficulty of validating a spreadsheet, instituting controls, and ensuring that the scripting hasn’t been tampered with.

This is particularly true of laboratory database systems such as LIMS, inventory, and the other requirements noted. Regarding the needs stated earlier, spreadsheets don’t accommodate any of them; they are tools, and the applications need to be built.

Spreadsheet LIMS project development requires a formal software development effort; it’s not just building a spreadsheet and populating it with data and formulas. Current industrial lab operations are subject to corporate and regulatory guidelines; violating them can have serious repercussions, including shutting down the lab or production facility. Software development requires documentation, user requirement documents, design documents, user manuals, and materials to support the software. What looks like a fun project may become a significant part of your work. This impacts the cost factor; the development and support costs may occupy your entire time.

Performance is another matter. Spreadsheet systems are typically single-user-at-time. They don’t have the underlying database management support needed for simultaneous multi-user operations. This is one reason spreadsheet implementations give way to replacements with LIMS in industrial labs.

“Implementing a Service Lab’s Support Systems: From Paper to Spreadsheets to LIMS”  is a document that will be helpful to you in considering a direction for laboratory informatics development and planning it describes the strengths and weaknesses of different approaches. It also contains references to regulatory compliance documents on the use of spreadsheets.

Laboratory Information Management Systems and Electronic Laboratory Notebooks

The commercial software market has evaluated the needs of research and service laboratory operations and developed software specifically designed to address those points.

LIMS was designed to meet the requirements of service laboratories such as quality control, contract testing, analytical lab, and similar groups. The important item is not the name of the lab but its operational characteristics – the processes used – as described earlier. The same software systems could easily be at home in a research lab that conducts a lot of routine testing on samples. If the lab has sets of samples, it needs to keep track of testing, generating worklists, and reporting results on a continuing basis, a LIMS may well be appropriate for their work. It is designed for a highly structured operational environment as described in the “On the service lab model and research in general” section earlier.

ELNs are useful in an operational environment that is less structured, whose needs and direction may change and whose data storage and analysis requirements are more fluid. That is usually in labs designated as “research” but not exclusively. For example, an analytical lab doing method development could use ELN to support that work.


  • LIMS has a well-defined database structure, and,
  • Software components to work with it and support its underlying processes.
  • It is purpose-built to manage sample logins, sample tracking, results entry, results tracking, reporting, etc. These comprise its core functionality.
  • In addition, LIMS may include instrument maintenance databases, trend analysis, inventory, personnel records, etc.
  • ELNs have modules that can be organized to support projects, and, manage data according the research process used in that labs work.
  • Some, at the vendors discretion, will contain graphics modules, inventory modules, and links to third party applications such as reaction databases, molecular modeling, and statistical analysis.


Both LIMS and ELNs are “top tier” levels of software. LIMS may be subordinate to an ELN in a research lab whose work includes routine testing. Functionality in the “Needs” list that is not part of the products (functionality is determined by the vendor, usually with customer input) can be supplied by offerings from third party vendors. Those can consist of applications that can be linked to the LIMS and ELN, or completely independent products.

Supporting tiers of software

There are limits to what a vendor can or wants to include in their products. The more functionality, the higher the cost to the customer, and the more complex the support issue become. One of those areas is data storage.

The results storage in LIMS is usually limited to sets of numerical values or alphanumeric strings – the color of something for example. Those results are often based on the analysis of larger data files from instruments, images, or other sources, files too large to be accommodated within a LIMS. That problem was solved in at least two ways: the creation of Scientific Data Management Systems (SDMS), and, Instrument Data Systems (IDS). From the standpoint of data files, both have some elements in common though IDS’s are more limited.

Scientific Data Management Systems act as large file cabinets, different types of materials can be put into them – images, charts, scanned documents, data files, etc. – and then be referenced by the LIMS or ELN. The results sit in LIMS or ELN and supporting information is linked to in the SDMS. That keeps the LIMS and ELN databases easier to structure while still supporting the labs needs to manage large files from a variety of sources and formats.

That same facility is available through instrument data systems but only for the devices that the IDS is connected to, chromatographs, spectrometers, etc. In addition to that function, IDSs support instrumental data collection, analysis, and reporting. They can also be linked to LIMS and ELNs by electronically receiving worklists (lists of samples to be processed) and then electronically sending the results back thus avoiding the need for entering the data manually.

Laboratory Execution Systems are an aid in ensuring that laboratory methods and procedures are carried out correctly with documented support for each step in the method.  They can be seen as a quasi-automation system, except instead of automation hardware doing the work, you are, guided step-by-step by strict adherence to the method as embedded in software.  The problem that this type of system, originally styled as an “electronic laboratory notebook” by Velquest Inc. (since purchased by Accelrys) was to ensure that there was documented evidence that every step in a lab procedure was properly carried out by qualified personnel, using calibrated/maintained instruments, current reagents, with all data electronically captured with potential links to LIMS or other systems.  That system would provide bullet-proof support for data/information generated by procedures in regulated environments. That in turn means that the data/information produced is reliable and can stand up to challenges.

Implementations of LES have ranged from stand-alone applications that linked to LIMS, ELNs, and SDMS, to programmable components of LIMS and ELNs.  Instead of having a stand-alone framework to implement an LES as with Velquests product, embedded systems provide a scripting (ok, programming by the user) facility within a LIMS that has access to the entire LIMS database; avoiding the need to interface two products.  That embedded facility put a lot of pressure on the programmer to properly design, test, and validate each procedure in an isolated structure, separate from the active LIMS so that programming errors didn’t compromise the integrity of the database.  Note: procedure execution in stand-alone systems requires validation as well.

Instrument Data Systems In the late 1960s instrument vendors explored the benefits of connecting laboratory instruments to computer systems to see what could be gained from that combination.  One problem they wanted to address was the handling of the large volumes of data that instruments produced.  At that time analog instrument output was often recorded on chart paper and a single sample through a chromatograph might be recorded on a chart that was one or more feet long.  A day’s work generated a lot of paper that had to be evaluated manually.  Very time-consuming and labor-intensive.  Vendor successfully developed computer-instrument combinations that would automatically transform the instrument’s analog output to a digital format, store, and process that data to produce useful information (results) eliminating the manual effort and making the labs more cost-effective in the process.  This was the first step in instrument automation, the user was still required to introduce the sample into the device.  The second step solved that problem in many instances through the creation of automatic samplers that moved the samples into position for analysis.  This could include auto-injectors, flow-through cells, etc. depending on the analytical technique.

Autosamplers ranged from devices that used a syringe to take a portion of a sample from a vial and inject it into the instrument, to systems that would carry out some sample preparation tasks and then inject the sample[14], to pneumatic tube based systems[15] that would bring vials from a central holding area, inject the sample and then return the sample to holding where it would be available for further work or disposal.

The key to all this was the IDS computer and its programming that coordinated all these activities, acquired the data, and then processed it.  Life was good as long as human intelligence evaluated the results, looked for anomalies, and took remedial action where needed.  For a more detailed discussion of this subject please see “Notes on Instrument Data Systems” at

The instrument data system took us a long way toward an automated laboratory, but there was still some major hurdles to cross including sample preparation, sample storage management, and systems integration.  While much has been done, there is a lot more work to do. 

Getting work done quickly and at a low cost

Productivity is one of the driving factors in laboratory work, as in any production environment. What is the most efficient and effective way to accomplish high-quality work at the lowest possible cost? That may not sound like “science,” but it is about doing science in an industrial world, in both research and service labs. That need drives software development, new systems, better instrumentation, and investments in robotics and laboratory informatics technologies.

That need was felt most acutely in the clinical chemistry market in the 1980s. In that field, the cost of testing was an annual contractual agreement between the lab and its clients. If costs increased during that contract year, it impacted the lab’s income, profits, and ability to function. Through an industry-wide effort, the labs agreed to pursue the concept of Total Laboratory Automation (TLA). That solution involved the labs and working with vendors and associations to create a set of communications standards and standardized testing protocols that enable the clinical chemistry labs to contain costs and greatly increase their level of operational effectiveness. The communications protocols led to the ability to integrate instrumentation and computer systems, streamlining operations and data transfer. The standardized test protocols allowed vendors to develop custom instrumentation tailored to those protocols, and to have personnel educated in their use.[16]

The traditional way of improving productivity in industrial processes, whether on the production line or in the laboratory is through automation. This is discussed in detail in two articles found on

  • Elements of Laboratory Technology Management, and,
  • Considerations in the Automation of Laboratory Procedures.

Connecting and IDS to a LIMS, SDMS, or ELN is one form of automation, in the sense that information about samples and work to be done can travel automatically between a LIMS to an IDS, be processed, and the results returned without human intervention.

Successful automation requires several conditions:

  • Proven, validated procedures/methods;
  • A proven, validated automated version of those methods (note: the method doesn’t have to be re-validated, but the implementation of the method does)[17]; and,
  • A clear economic justification for the automation, including sufficient work to be done to make the automation of the process worthwhile.
  • Standardization is communications and equipment where possible – this is one of the key success factors in clinical chemistry’s TLA program, and, the reason for the success in automation and equipment development in processes using microplates.

Robotics is often one of the first things people think of when automation is suggested. It is also one of the more difficult to engineer because of the need for expertise in electromechanical equipment, software development, and interfacing between control systems and the robotic components themselves. If automation is a consideration, then the user should look first to commercial products. Vendors are looking for any opportunity to provide tools for the laboratory market. Many options have already been exploited or are under development. Present your needs to vendors and see what their reaction is. As noted earlier, user needs drove the development of autosamplers – essentially robots. Aside from purpose-built robotic add-ons to laboratory instrumentation, a common approach is user-designed robotics. We’ll cover more on the subject of user-built systems below, but the bottom line is that they are usually more expensive to design, build, and maintain than the original project plan allows for.

Robotics have a useful role in sample preparation. That requires careful consideration of the source of the samples and their destination. The most successful applications of robotics are in cases where the samples are in a standardized container, and the results of the preparation are similarly standardized. As noted above, this is one reason why sample processing using microplates has been so successful. The standard format of the plates with a fixed set of geometries for sample well placement means that the position of samples is predictable, and equipment can be designed to take advantage of that.

Sample preparation with non-standard containers requires specialized engineering to make adaptations for variations. Early robotics used a variety of grippers to grasp test tubes, flasks, bottles, etc. and in the long run, they were unworkable for long-term applications, often requiring frequent adjustments.

Sample storage management is another area where robotics has a potential role but is constrained by the lack of standardization in sample containers; a point that varies widely by industry. In some cases samples, particularly those that originate with consumers (water testing for example), can come in a variety of containers and they have to be handled manually to organize them in a form that a sample inventory system can manage. Life sciences applications can have standardized formats for samples, but in cases such as biobanking[18] can have retrieval issues because the samples may be stored in freezers with multiple levels making them difficult to access.

A basic sample storage management system would be linked to a LIMS, have an inventory of samples with locations, appropriate environmental controls, a barcode system to make labels machine-readable, and if robotics were considered, be organized so that a robot could have access to all materials without disrupting others. It would also have to interface cleanly with the sample preparation and sample disposal functions.

Artificial intelligence (AI) applications in the lab

This is both an easy and difficult subject to write about. Easy because anything written will be out-of-date as it is produced, difficult because we really have no idea where things are going, and what we think of as an advanced AI now will be superseded next month, and probably has been in classified intelligence circles. With that said, there are still some useful things to say about the AI in laboratory informatics that would benefit you.

We need to be very cautious about the application of AI to lab work, if nothing else, because you are signing your name to lab results, not the AI. If something is challenged, or found to be a problem, you are responsible and accountable for the work. This is particularly problematic for AI usage since we can’t trace back to the cause of the problem. In traditional programming, we have the code and can find the source of a programming problem. There is no traceback with AI systems as they currently exist. Was there a coding issue? Was there a training issue? Was the training data set biased?

The application of computing technology to lab work has placed layers of computing between the scientist and the actual measurements made in the lab. Each layer holds the potential for problems to develop, and assumptions to be made, which the analyst may not be aware of. Put a sample into the instrument, and the computer produces a report. Between the analog output of the instrument – the actual instruments’ measurement –  and what the analyst sees we have:

  • The analog-to-digital converter (ADC) reads the signal and converts it to a digital value, there are different types of ADC, some better suited to some applications than others;
  • Noise filtering which may precede the ADC or exist in the form of digital signal processing;
  • Software that converts the raw data to a more usable form. This can range from a simple pass-through (nothing needed), to peak detection, baseline correction, overlapping peak resolution, peak size calculations, etc.
  • Quantitative analysis based on the previous points for samples and reference materials.
  • Hopefully an evaluation for anomalies and segregation of problem samples.
  • Whatever capabilities an AI provides.

None of this is a problem as long as the analyst is aware of what is going on and has taken steps to ensure that none of those actions adversely affect the results of the analysis. This is the direction that laboratory computing is taking, and it provides a useful basis for laboratory automation systems that will reduce the cost of doing work, and free over-loaded personnel to do other work. AI is another potentially useful layer to streamlining the work you are signing your name to. Similar concerns would also apply to automated sample preparation.

There is a long list of applications that could benefit from AI technologies, and they are being explored and exploited. The question of what those applications are was put to ChatGPT-4 and it responded with:

Prompt: What are some good examples of successful applications of artificial intelligence applied to laboratory work?

Certainly! Artificial Intelligence (AI) and Machine Learning (ML) have made significant inroads into laboratory work across various scientific disciplines. Here are some successful applications:

  1. Automated Image Analysis:
    • Pathology: Automated detection and classification of tumors in histopathological slides.
    • Microbiology: Automated identification and counting of bacterial colonies or reading antibiotic susceptibility tests.
    • Cell Biology: Analyzing cell morphology, cell counting, or detecting cellular events in microscopy images.
  1. Predictive Analytics:
    • Drug Discovery: Predicting drug interactions and potential drug candidates by analyzing chemical structures.
    • Genomics: Predicting potential disease genes or understanding gene function.
  1. Laboratory Automation and Robotics:
    • Sample Handling: Robots equipped with AI can adaptively handle and process samples.
    • Optimization: AI can optimize experimental conditions such as reaction times, temperatures, or concentrations.
  1. Data Mining:
    • Literature Analysis: AI can scan vast amounts of published research to identify trends, potential collaborations, or overlooked conclusions.
    • High-throughput Screening: Analyzing and identifying compounds with desired properties from large libraries.
  1. Spectroscopy and Chromatography:
    • Analyzing complex spectra or chromatograms, distinguishing minute differences or predicting material properties based on spectral data. (
  1. Predictive Maintenance:
    • Predicting when equipment needs maintenance or identifying early signs of malfunction, ensuring that lab operations run smoothly.
  1. Natural Language Processing (NLP):
    • Automated transcription of lab notes or voice commands.
    • Extracting structured data from unstructured lab reports.
  1. Proteomics & Metabolomics:
    • Analyzing complex data sets to identify biomarkers or metabolic pathways.
    • Mass spectrometry data interpretation and peptide identification.
  1. Clinical Decision Support:
    • Assisting doctors in diagnosing diseases based on lab results, medical history, and other data sources.
  1. Environment Monitoring:
    • Detecting pollutants or pathogens in environmental samples using pattern recognition.
  1. Simulation & Modeling:
    • Computational chemistry: Predicting molecular behaviors or drug-target interactions.
    • Systems biology: Modeling complex biological systems or pathways.
  1. Quality Control & Assurance:
    • Detecting anomalies or outliers in routine lab measurements.
    • Predicting and ensuring the consistency of lab results over time.
  1. Biobanking:
    • Optimizing storage and retrieval of biological samples using pattern recognition and predictive analytics.
  1. Genetic Analysis and Personalized Medicine:
    • Predicting disease susceptibility or drug response based on genetic data.
  1. 3D Bioprinting:
    • Using AI to optimize the design and production of 3D-printed tissues or organs.


These are just a few of the many applications of AI in laboratory settings. As technology advances, it’s likely that AI will become an even more integral part of laboratory work, driving efficiencies and enabling new discoveries.

While that is an interesting list, more specifics would be useful.


Prompt: Can you provide references for those examples?

Of course! While I can’t directly link you to the references, I can mention landmark papers, well-known projects, or general topics that you can further search to access the original sources. Here’s a non-exhaustive list based on the applications mentioned:

  1. Automated Image Analysis:
    • Esteva, A., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature. (
    • Coudray, N., et al. (2018). Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning. Nature Medicine.
  1. Predictive Analytics:
    • Wallach, I., et al. (2015). AtomNet: A deep convolutional neural network for bioactivity prediction in structure-based drug discovery. arXiv preprint.
  1. Laboratory Automation and Robotics:
    • Gil, Y., et al. (2018). Toward the Geoscience Paper of the Future: Best Practices for Documenting and Sharing Research from Data to Software to Provenance. Earth and Space Science. (
  1. Data Mining:
    • Jensen, L. J., et al. (2006). STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic acids research.
  1. Spectroscopy and Chromatography:
    • Pasolli, E., et al. (2016). Machine learning meta-analysis of large metagenomic datasets: tools and biological insights. PLoS computational biology. (
  1. Natural Language Processing (NLP):
    • Jensen, L. J., et al. (2006). Literature mining for the biologist: from information retrieval to biological discovery. Nature reviews genetics.
  1. Proteomics & Metabolomics:
    • Smith, C. A., et al. (2006). XCMS: Processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. Analytical chemistry.
  1. Clinical Decision Support:
    • Rajkomar, A., et al. (2018). Scalable and accurate deep learning with electronic health records. NPJ Digital Medicine. (
  1. Environment Monitoring:
    • Aksenov, A. A., et al. (2014). Global survey of the oxygen isotope composition of bioavailable soil water. Earth and Planetary Science Letters.
  1. Simulation & Modeling:
    • Duvenaud, D., et al. (2015). Convolutional networks on graphs for learning molecular fingerprints. Advances in neural information processing systems.
  1. Quality Control & Assurance:
    • Hinton, G. E., et al. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint. (
  1. Genetic Analysis and Personalized Medicine:
    • LeCun, Y., et al. (2015). Deep learning. Nature.

These references are based on well-accepted papers and findings up to my last training cut-off in January 2022. You might want to search for these papers or topics in databases like Google Scholar, PubMed, or academic journals for full access and more recent works on the subject.

The URLs in italics are ones that have been verified.

This latter list is consistent with material that has been found in advertising and the literature. Mettler Toledo has been advertising an AI system for its thermal analysis products.[19]  Articles are also available for the automated analysis of spectra (Automatic materials characterization from infrared spectra using convolutional neural networks[20]). There is an interview with ChatGPT about how it sees applications to spectroscopy.[21]

The approach Mettler takes with thermal analysis is interesting.  It presents the results as a suggested analysis of the data and makes it easy for the analysis to adjust settings if it disagrees with that result.  It means that the analyst has to examine each result, and after review accepts the responsibility for the work by either making adjustments or accepting things as they are.  This requires an experienced analyst. The problems begin if we skip that step in pursuit of fully automated facilities, are in a rush to release results or put the tools in the hands of inexperienced personnel.

AI offers a great deal of potential benefits to laboratory work, something marketers are going to exploit to attract more customers.  Product offerings need to be viewed skeptically as supposed benefits may be less than promised, or not as fully tested as needed.  Unlike dealing with spelling checkers that consistently make improper word substitutions, AI-generated errors in data analysis are dangerous because they may go undetected.

Systems development, tinkering vs. engineering.

One common practice in laboratory work, particularly in research, is modifying equipment or creating new configurations of equipment and instruments to get work done. That same thought pattern often extends to software development; components such as spreadsheets, compilers, database systems, and so on are common parts of laboratory computer systems.  Many people include programming as part of their list of skills.  That can lead to the development of special purpose software to solve issues in data handling and analysis.

That activity in an industrial lab is potentially problematic.  Organizations have controls over what software development is permitted so that organizational security isn’t compromised and that the development activities do not create problems with organizational or regulatory requirements and guidelines.

If a need for software develops, there are recognized processes for defining, implementing, and validating those projects.  One of the best-known in laboratory science comes from the International Society of Pharmaceutical Engineers (ISPE) and is known as Good Automated Manufacturing Practices (GAMP)[22].  There is also a discussion of the methodology in “Considerations in the Automation of Laboratory Procedures”[23].

The development process begins with a “needs analysis” that describes why the project is being undertaken and what it is supposed to accomplish, along with the benefits of doing it.  That is followed by a user-requirements document that has to be agreed upon before the project begins. Once that is done, we can develop a prototype system(s) that will give you a chance to explore different options for development, project requirements, etc. that will form the basis of a design specification for the development of the actual project (the prototype is scrapped).  At this point the rest of the GAMP process is followed through the completion of the project.  The end result is a proven, working, documented system that can be relied upon (based on evidence) to work and be supported.  If changes are needed, the backup documentation is there to support that work.

This is an engineering approach to systems development.  Those systems may result in software, a sample preparation process, or the implementation of an automated test method.  It is needed to ensure that things work, and if the developer is no longer available the project can be used, supported, modified, etc., as needed.  The organizations investment is protected, the data/information produced can be supported and treated as reliable, and all guidelines and regulations are being met.

Examples of laboratory informatics use in industry

There is wide application of the laboratory informatics tools note in different industries, each with its own variations depending upon the work environment.  The following list is taken from one vendors site that the author is familiar with.  Other vendors have their own perspectives.  In each case there are three things to look for: how the laboratories work within each industry, the need for laboratory accreditation in each case, and the need to meet regulatory requirements.  This is far from an exhaustive list, but it will give you an idea of the variety of lab environments that exist.

In closing…

The purpose of this document is to give the student a high-level overview of the purpose and use of the major laboratory informatics systems that are commonly used in industrial research and service laboratories.  This is an active area of development, with new products and platforms being released annually usually around major conferences.  The links included are starting points to increasing the depth of the material.




[1] Formerly referred to as “Laboratory Automation Engineers”, the change was due to broader range of topics that need to be addressed.
[2] ASTM at one time stood for American Society for Testing Materials but has shortened the name as its breadth of work increased. Its primary role was developing standardized testing procedures that were used industry-wide.
[7] ALCOA was the original framework that was extended to ALCOA-CCEA
[8] Schmitt, S., “Assuring Data Integrity for Life Sciences”, 2016, ISBN: 1-933722-97-5,
[9] CFR stands for Code of Federal Regulations (, which has been updated to an electronic version Electronic Code of Federal Regulations (
[16] and
[17] “Considerations When Implementing Automated Methods into GxP Laboratories”