3.1. Control charts for proteomics dataAs discussed above, the first task to be carried out is systemcharacterisation. log2 transformed raw feature volume datafrom all of the standards was used to highlight additionalconsiderations and visualisations before systematically consideringmultiple transforms and different data types.Consideration of the meta data in Section 2 indicates manyfactors which may be expected to affect the bulk of thefeatures simultaneously (e.g. the PMT voltage setting on thescanner). This suggests that an obvious first metric forexploratory system characterisation is some form of globalmeasure. Fig. 6 from [9] shows the design plot based on asubset of the meta data factors and how they group a globalmeasure (the median values of all features on a gel).In general, there is no requirement to use a global metric. Asubset of features could be used and can provide moresensitivity to specific issues [7]. There is also no limit on thenumber of metrics or rules used except that the portfolio ofrules should be considered as a whole in cost benefit terms.Themetric presented in all of the graphs and analysis in thissection is themedian value of all of the 1004 features froma gelimage under the specified transformation or normalisationscheme. All the results have used batches 2–6 as a reference setfor calculation of the mean and control limits and for thepurposes of normalisation when applicable.The various control chart versions in this section add moreandmore information to the plots to assist in the exploration of‘assignable cause’. In general this involves layering informationfrom meta data factors onto the chart.Fig. 3 shows a control chart based on taking the chosenmetric for each of the standard gels. This chart is slightlydifferent to those presented in the Introduction. The observationsare in ‘2D Gel Batch’ order and the batch information isnow shown on the plot. Visually, there is already a suggestionthat assignable cause variation is present, with some correlationto ‘2D Gel Batch’.Fig. 4 shows the distributional information for the datashown in Fig. 3. Visually there may be some suggestion of nonnormalitybut the distribution would not really suggest an issuewith particular batches.Given that visually several batches appear to be subject tosome form of assignable cause, we can explore defining limitsthat would objectively report this. In the Introduction, thecreation of statistical limits and the possibilities of windowingdata was discussed, but details of how the mean and limitsshould be calculated was not. Shewart [16] formulated theinitial work in this area. The Shewart version of the controlchart usually calculates limits from batched observations andutilises running limits. Subsequently, Levey and Jennings [17]developed a version that is more convenient for routine usein clinical chemistry laboratories. Their approach utilises anumber of reference samples from which the limits arederived. They suggest that the reference set should remainvalid for a certain time period or reagent batch after which thesystem should be re-assessed and potentially re-calibrated.There is an obvious trade off between generating stable andmeaningful limits and the number of standards you need toproduce them.There are no clear specifications on how many samplesshould be used to derive the population metrics. Fig. 5 showshow the estimate of the mean and sd cumulatively developsacross the data set as increasing numbers of observations areincluded. Despite the variation present across the wholegraph, the estimates are relatively stable after 20–40 observationseven with the inclusion of the scanner setting inducedshifts in batches 8, 10 and 12. Over the full set it appears thatusing smaller numbers of observations to assess the mean andcontrol limits has a tendency to underestimate the variance,resulting in narrower limits and potentially more falsepositive ‘out of control’ events. Consider Fig. 6, this uses thestandard gels from 2D gel batches 2 through 6, which resultsin tighter limits and the ‘rejection’ of three observationswhich were not rejected when the limits were set using allobservations. Also note in Fig. 6 that stricter 2 sd limits (whichwould flag up all points in the yellow or red areas) would haveat least one ‘rule fail’ for all of the batches that look to haveshifts visually.The first question that occurs once the ‘black box’ isreported as being ‘out of control’ is; ‘Is there an assignablecause?’. This leads to consideration of batching and metadata. As previously mentioned, the [9] study recorded manymeta data factors whilst processing experimental samples.These range across reagent batch, the scanner used and itssettings, and which gels were run in the same tank and theirposition in the tank. Generally experimenters have toconsider the factors that they believe could systematicallyinfluence their results and at least record the details. Batchmeta information can be layered on top of the control chart toallow for a visual inspection of potential factors affectingprocess performance. Fig. 7 shows such a display.The original experiment was not designed to allow fullseparation of the meta factors and as such they are not allindependent. This can be exploredmore formally using variableclustering. Fig. 8 shows the output from the varclus commandfrom the rms package [12] in R [11]. In this analysis the metafactors are themselves clustered together (irrespective of themeasured feature data in the gels). This showswhich meta datafactors are related. In this gel set, ‘Labelling batch’, ‘2D GelBatch’ and ‘IEF batch for IPG strips’ are closely related so it isunlikely that one of these factors can be isolated as a lone causeof variation.The final tool used is the lag plot which allows us to assesscyclical behaviour within sample windows, which was alsodiscussed in the Introduction. Fig. 9 shows a version whereadditional information is provided. The observations havebeen coloured by ‘2D Gel Batch’ (and also numbered) which gives a visual indication as to whether outlier behaviour isbatch related. A simple set of data bounds have been addedwhich depictwheremost of the data is expected to reside, givenestimates based on the data itself and normality assumptions.The inner ellipse is expected to contain 50% of the data and theouter ellipse 95% (using the dataEllipse function fromthe car package). For a normally distributed random process, a uniformcloud of points is expected to develop. In Fig. 9 we can see thatthis appears to be the case for most of the batches, but batches8, 10 and 12 deviate from this distribution.The visualisations and metrics discussed do not report onthe absolute performance of the system. Being ‘in control’ doesnotmean a system has high performance; it simplymeans thatit is operating in the manner we would expect it to given somewindow of past performance. It is obvious that a ‘noisy system’will have wider absolute limits than a ‘low noise’ system.Our focus in this paper is on initially characterising the state ofthe system, providing a route to improvement and creatingrules that objectively warn when system performance may beunexpectedly changing. The practitioners may well wish tobenchmark their system against other such systems as part ofthe characterisation process but this is beyond the scope of thispaper.The tools and visualisations described can now be usedto explore properties of the data set and give an indicationof the sorts of QC issues that SPC could objectively report on.3.2. Standards log2 transformed raw volumeFig. 7 shows the data for all of the log2 raw volume results fromthe standards. As noted earlier the chart is unremarkableexcept for apparent excursions in batches 8, 10 and 12, mostlikely linked to scanner settings. It is interesting that batch 1 (with the known running error that caused around one fifth ofthe gel features to be missing) looks very well behaved underthis scheme and metric. This is not entirely unreasonablegiven the way the gel scans were optimised for scanning andthe use of the median of a large number of features.Fig. 9 shows the lag plot for the same data. As mentioned inSection 3.1 the points appear to be clustered reasonablytightly with notable excursions for batches 8, 10 and 12.Fig. 10 shows a design plot similar to the one presentedin the Jackson et al. [9] paper. The design plot is another alternative visualisation of the data shown in the controlchart. Design plots were discussed in detail by Jackson et al.[9]. Briefly, instead of focussing mainly on a singlemeta databatch factor, the design plot lists a number of factors andshows how the median of the observations included underdifferent batch schemes would plot on the control chart. Thechart has been extended to show the same statistical limitsas the control charts.We can see fromthis data presentationthat the most outlying 2D gel batches are 12, 10, 8 and 13which supports the conclusions of the previous two figures.There is a suggestion that IEF batch for IPG strip 10 isan outlier but as we noted under the variable clusteringanalysis (Fig. 8) that the meta data does not offer us thecapability to really separate out the factorswith any confidence.The ‘PMT Voltage for Cy3’ data seems to suggest that the PMTvoltage setting has quite an impact on themetric being used—as we may expect given this is raw scan data.3.3. Standards under Variance Stabilisation NormalisationThe original study utilised a scanning rule that sought tostandardise batch scanning by optimising the value of a fewfeatures on the gel (serotransferrin). Normalisation schemes,such as VSN, can utilise information from many features andthrough this can provide more reliable estimates of thebehaviour of the whole population that are more robust tooutliers. They also add assumptions about the underlyingdistributions of the data. It can be useful to explore the impactof these assumptions by adding analysis steps in stages.The previous section showed the raw data, now we explorehow further processing and assumptions can change our viewof the data. Fig. 11 shows the control chart for the standardsunder the VSN scheme. It is clear that the normalisation schemehas successfully compensated for the differences in scannersettingwith batches 8, 10 and 12 now appearing to fit in with therest of the batches. Batch 1 (with gel over-run issues) is nowclearly showing as an outlier.Fig. 12 tells a similar story with a tighter cluster of batchesshowing batch 1 as a clear outlier. There is also somesuggestionthat batch 14may have an undiagnosed issue.Fig. 13 clearly shows how the VSN transformation hasreduced the impact of the scanner settings with the ANOVAfor PMT Cy3 now reporting no significant difference betweenthe settings based on using the batch medians as a metric.Again, batch 1 is a clear outlier under 2D gel batch grouping andthere is a suggestion of problems with batch 14. As meta factorsfor the IEF and labelling batches are known to correlatewith the2D gel batch they are not considered any further here.simple to produce and we would recommend the use ofmultipleviews and data processing schemes in parallel when exploringthe system characteristics.3.4. Samples under Variance Stabilisation NormalisationFig. 14 shows the control chart for the samples under theVSN scheme. The results are essentially the same as forthe standards with batch 1 clearly an outlier. This is a veryencouraging result as it suggests that, for proteomics data,the statistical process control technique not only works forrepeats of the same sample but can also function whenbiological variation is included not only in the data itself butalso in the reference set. This may not be the case for all datasets so it is recommended that this finding should be confirmedfor any given experiment design.It is not entirely surprising that a global metric of samplesbehaves in a similar manner as a global metric of standards.Proteomics has been using the assumption that ‘not all analytes are affected under the experimental conditions’ forsome time; it is assumed by almost all data normalisationschemes. The presentations in this section suggest that the sameassumption can be used to consider some sample features as‘built in standards’ for QC purposes. The fact that the DIGE gelpair is highly correlated could have contributed to the similarityof the result but both analyses were conducted independently.At least, this demonstrates that the biological variation within thesample set does not prevent technical issue rules being derived.Fig. 15 shows the corresponding lag plot. It, too, is similarto the standards version (Fig. 9) but with less suggestion of anissue with batch 14. The design plot for the samples (Fig. 16)is also very similar to the standards version and again fails toshow an issue for batch 14.3.5. Samples under Variance Stabilisation Normalisationdifference to standardsFig. 17 shows the control chart for the VSN difference case.Batch 1 is not reported as having any issues. This is expectedas the over-run affects both stains within the gel equally andhence is masked. In this case, the batch quality issue waseasily identified by the experimenters but in other casesthe images are altered in a more subtle way which may stillmatter, for example a technical issue with albumin overloadsthat causes material to be deposited as streaks in an area ofthe gel containing other features. In this case the problemmay not be detected so easily because the issue equallyaffects both sample and standard images and can mask thereal expression values without anything unusual appearing inthe differential data. Jackson and Bramwell [7] explore theseissues in more detail.It is important that an experimenter is informed whensuch masking may be occurring. The QC procedure can beadapted to utilise not only the whole image but subsets offeatures that report on the QC of different sections of the data.It may be that the investigation of an outlier highlights such alocalised issue and a rule is created to report specifically onthis in the future (see [7] for specific examples).There are indications of an issue with batch 14 with themeta ribbon suggesting that there may be a problem withthe 2D gel batch, IEF batch or labelling batch. Batch 14 was there-run batch of batch 1. This is a post analysis so there is littlemore that can be done other than be cautious about resultsfrom that batch. Additional analyses to localise the origin ofa noted difference are also possible by comparing differentialresults versus other batches. If this analysis had been performedas part of the original experiment it would have beenpossible to explore the issue more fully and potentially re-runthe samples. Batch 14 did not stand out as an outlier batch inthe original Jackson et al. analysis. This may be a result of thediffering feature sets employed as 4534 features were used inthe original paper compared to 1004 in this analysis.Figs. 18 and 19 show the lag and design plots for the VSNdifference data. These support the conclusions drawn fromthe control chart although it is interesting to note that in thedesign plot the scanner setting value spread seems to havea larger relative impact than was observed in the VSN singlestain experiments. It may be that future experiments shouldexplore whether the scanner setting optimisation strategyneeds review. Simply locking the scanner settings to a pre-setvalue may also cause issues, as this leads to some gels losingdynamic range whilst others saturate out certain feature sets.The control chart approach gives a metric to explore this, butthe impact of such procedural changes must be consideredfor the entire process ‘end-to-end’. It is important to take apragmatic approach and aimto identify themetricswhich reporton key factors, i.e. those directly impacting upon the desiredquality of the experiment. Becoming unduly concerned withoptimising all aspects of a process can actually be counterproductive,as this consumes valuable time and resources yetcan potentially yield negligible benefits.