# Introduction

Figure 1. Total cost of one million computing operations over time. Data from Nordhaus Nordhaus_01. code
Figure 2. Storage cost, in US dollars per Mbyte, of mass market technologies over time. Data from McCallum McCallum_16. code
Figure 3. Growth of transport and product distribution infrastructure in the USA (underlying data is measured in miles). Data from Grübler et al Grubler_91. code
Figure 4. Market capitalization of IBM, Microsoft and Apple (top) and expressed as a percentage of the top 100 listed US tech companies (bottom). Data extracted from the Economist website Economist_15. code
Figure 5. Unit sales of processors used in various ecosystems. Data from Gordon Gordon_87 (mainframes and minicomputers) and Hilbert et al Hilbert_11 (post 1985 hardware). code
Figure 6. Monthly unit sales (in thousands) of microprocessors having a given bus width. Data kindly supplied by Turley Turley_02. code
Figure 7. Changing habits in men’s facial hair. Data from Robinson Robinson_76. code
Figure 8. Number of papers, in each year between 1987 and 2003, associated with a particular IT topic. The E-commerce paper count peaks at 1,775 in 2000 and in 2003 is still off the scale compared to other topics. Data kindly provided by Wang Wang_10. code
Figure 9. Normal distribution with total percentage of values enclosed within a given number of standard deviations. code

# Human cognitive characteristics

Figure 10. Simon’s scissors… code
Figure 11. Could be more convincing hemispheres with light shining from above and below… code
Figure 12. Probability that rat N1 will press a lever a given number of times before pressing a second lever to obtain food, when the target count is 4, 8, 12 and 16. Data extracted from Mechner Mechner_58. code
Figure 13. Boy/girl (aged 11-12 years) verbal reasoning, quantitative reasoning, non-verbal reasoning and mean CAT score over the three tests; each stanine band is 0.5 standard deviations wide. Data from Strand et al Strand_06. code
Figure 14. Rotate text in the real world, by tilting the head, or in the mind? code
Figure 15. Two objects paired with another object that may be a rotated version. Based on Shepard et al Shepard_71. code
Figure 16. Structure of mammalian long-term memory subsystems; brain areas in red. Based on Squire et al Squire_15.
Figure 17. Percentage correct answers to questions about binary operator precedence against occurrence in source code. Data from Jones Jones_06a. code
Figure 18. Response time (left axis) and error percentage (right axis) on reasoning task with given number of digits held in memory. Data extracted from Baddeley Baddeley_09. code
Figure 19. Major components of working memory: working memory in yellow, long-term memory in orange. Based on Baddeley Baddeley_12. code
Figure 20. Yes/no response time (in milliseconds) as a function of the number of digits held in memory. Data extracted from Sternberg Sternberg_69. code
Figure 21. Parse tree of a sentence with no embedding, upper "S 1", and a sentence with four degrees of embedding, lower "S 4". Based on Miller et al Miller_64. code
Figure 22. Sequencing errors (as percentage) after interruptions of various length (red), including 95% confidence intervals, normal sequence error rate in green; lines are fitted model predictions. Data from Altmann et al code
Figure 23. Probability of correct recall of words by serial presentation order (each word visible for 1 or 2 seconds, last digit in legend). Data extracted from Murdoch Murdoch_62. code
Figure 24. Semantic memory representation of alphabetic letters (the numbers listed along the top are place markers and are not stored in subject memory). Readers may recognize the structure of a nursery rhyme in the letter sequences. Derived from Klahr Klahr_83. code
Figure 25. Time taken to solve the same jig-saw puzzle 35 times, followed by a two-week interval and then another 35 times, with power law and exponential fits. Data extracted from Alteneder Alteneder_35. code
Figure 26. Completion times of eight solo (upper) and eight pairs (lower) for each implementation round, along with fitted equation…. Data kindly provided by Lui Lui_06. code
Figure 27. Subjects belief response curves for positive weak&endash; strong, negative weak&endash; strong, and positive&endash; negative evidence. Based on Hogarth et al Hogarth_92. code
Figure 28. Country boundaries distort judgement of relative city locations. Based on Stevens et al Stevens_78.
Figure 29. Orthogonal representation of shape, color and size stimuli. Based on Shepard Shepard_61.
Figure 30. The six unique configurations of selecting four times from eight possibilities, i.e., it is not possible to rotate one configuration into another within these six configurations. Based on Shepard Shepard_61.
Figure 31. Percentage of correct answers given by one subject, against boolean-complexity of category, colored by number of positive cases needed to define the category. Data kindly provided by Feldman Feldman_00. code
Figure 32. The Berlin and Kay Berlin_69 language color hierarchy. The presence of any color term in a language implies the existence, in that language, of all terms below it. Papuan Dani has two terms (black and white), while Russian has eleven (Russian may also be an exception in that it has two terms for blue.) code
Figure 33. Cup- and bowl-like objects of various widths (ratios 1.2, 1.5, 1.9, and 2.5) and heights (ratios 1.2, 1.5, 1.9, and 2.4). The percentage of subjects who selected the term cup or bowl to describe the object they were shown (the paper did not explain why the figures do not sum to 100%). Based on Labov Labov_73. code
Figure 34. A commercial event involving a buyer, seller, money, and goods; as seen from the buy, sell, pay, or charge perspective. Based on Fillmore Fillmore_77. code
Figure 35. Lines of code correctly recalled after a given number of 2 minute memorization sessions; upper plot actual program, lower plot line order scrambled. Data extracted from McKeithen et al McKeithen_81. code
Figure 36. Examples of features that may be preattentively processed (parallel lines and the junction of two lines are the odd ones out). Based on Ware Ware_00.
Figure 37. Continuity&emdash; upper left plot is perceived as two curved lines; Closure&emdash; when the two perceived lines are joined at their end (upper right), the perception changes to one of two cone-shaped objects; Symmetry and parallelism&emdash; where the direction taken by one line follows the same pattern of behavior as another line; Proximity&emdash; the horizontal distance between the dots in the lower left plot is less than the vertical distance, causing them to be perceptually grouped into lines (the relative distances are reversed in the right plot); Similarity&emdash; a variety of dimensions along which visual items can differ sufficiently to cause them to be perceived as being distinct; rotating two line segments by 180°ree; does not create as big a perceived difference as rotating them by 45°ree;; TODO look good. code
Figure 38. Perceived grouping of items on a line may be by shape, color or proximity. Based on kubovy et al kubovy_08. code
Figure 39. Examples of unique items among visually similar items. Those at the top include an item that has a distinguishing feature (a vertical line or a gap); those underneath them include an item that is missing this distinguishing feature. Based on displays used by Treisman et al Treisman_85. code
Figure 40. The foveal, parafoveal and peripheral vision regions when three characters visually subtend 3°ree;. Based on Schotter et al Schotter_12. code
Figure 41. Example object layout and the corresponding ordered tree produced from the answers given by one subject. Data extracted from McNamara et al McNamara_89. code
Figure 42. Heat map of one subject’s cumulative fixations (black dots) on a screen image. Data kindly provided by Ali Ali_12. code
Figure 43. The four cards used in the Wason selection task. Based on Wason Wason_68. code
Figure 44. Probability a subject will successfully distinguish a difference between the number of dots displayed and a specified target number (x-axis is the difference between these two values). Data extracted from van Oeffelen et al van_Oeffelen_82. code
Figure 45. Line locations chosen for the numeric values seen by each of four subjects; color of fitted loess line changes at one million boundary. Data kindly provided by Landy Landy_16. code
Figure 46. Number of errors, in 132 simple multiplication trials (e.g., $3\times7$), upper plot shows operand values (a loess fit in yellow) and lower plot result value (points where both operands have the same value are in blue). Data from Campbell Campbell_97. code
Figure 47. One subject’s response time over successive blocks of command line trials and fitted loess (in green). Data kindly provided by Remington Remington_16. code
Figure 48. Subjects' estimate of their ability (x-axis) to correctly answer a question and actual performance in answering on the left scale. The responses of a person with perfect self-knowledge is given by the solid line. Data extracted from Lichtenstein et al Lichtenstein_77. code
Figure 49. Each row shows a scaled version of the three stripes, along with actual lengths in inches, from which subjects were asked to select the longest. Based on Asch Asch_56. code

# Economics

Figure 50. Months of developer effort needed to produce systems containing a given number of lines of code… Data from Gayek et al Gayek_04. code
Figure 51. Break even saving/investment ratio for various system survival rates (black 0.9, red 0.8, blue 0.7 and green 0.6) and development/maintenance ratios; system lifetimes are 5.5, 6, 6.5, 7 and 7.5 years (ordered top to bottom) code

# Software ecosystems

Figure 52. Number of process model change requests made in three years of a banking Customer registration project. Data kindly supplied by Branco Branco_12. code
Figure 53. Man hours required to build a particular kind of ship, at the Delta Shipbuilding yard, delivered on a given date. Data from Thompson Thompson_07. code
Figure 54. Dependencies between the Java packages in various versions of ANTLR. Data from Al-Mutawa Al-Mutawa_13. code
Figure 55. Number of pdf files created using a given version of the portable document format appearing on sites having a .uk web address between 1996 and 2010. Data from Jackson Jackson_12. code
Figure 56. Percentage share of total Android market at days since launch for various versions of Android. Data from Bidouille Bidouille_15. code
Figure 57. Introductory price and benchmark performance of various Intel processors over 2003-2013. Data from Sun Sun_14. code
Figure 58. Number of transistors, frequency and SPEC performance of cpus when first launched. Data from Danowitz et al Danowitz_12. code
Figure 59. Shipments, per year, of various computers between 1975 and 2012. Data from Reimer Reimer_12. code
Figure 60. Number of optional features selected by a given number of flags. Data kindly provided by Berger Berger_12. code
Figure 61. Percentage of code ported from NetBSD to various versions of OpenBSD, broken down by version of NetBSD in which it first occurred (denoted by incrementally changing color). Data kindly provided by Ray Ray_13.
Figure 62. Percentage of source in 130 releases of Linux that originates in an earlier release. Data extracted from png file kindly supplied by Matsushita Livieri_07. code
Figure 63. Number of projects making use of a given number of different languages in a sample of 100,000 GitHub project. Data kindly supplied by Bissyande Bissyande_13. code
Figure 64. Number of software systems surviving to a given number of years and an exponential fit. Data from Tamai Tamai_92. code
Figure 65. Ratio of development costs to five year maintenance costs for 158 IBM software systems sorted by size; curve is a beta distribution fitted to the data (in red). Data from Dunn Dunn_11. code
Figure 66. Survival curves of clones in the Linux high/medium/low level SCSI subsystems. Data from Wang Wang_12. code
Figure 67. Survival curve for packages included in the standard Debian distribution. Data from Caneill et al Caneill_14. code
Figure 68. Density plot of time interval, in hours, between each modification of a function in Evolution. Data from Robles et al Robles_12a. code
Figure 69. Number of functions (in Evolution) modified a given number of times broken down by number of authors. Data from Robles et al Robles_12a. code
Figure 70. Number of functions (in Evolution; the point at zero are incorrect counts) modified a given number of times (upper) or modified by a given number of different people (lower); red line is a straight line fit, green line a quadratic fit. Data from Robles et al Robles_12a. code
Figure 71. Number of identifiers renamed in the source of Eclipse-JDT, in a given month; date of version release marked. Data from Eshkevari et al Eshkevari_11. code
Figure 72. Survival curve for table rows in Wikimedia database. Data from Curino et al Curino_08. code
Figure 73. Changes in the number of tables (66 modifications) and total number of columns (150 modifications) in the Mediawiki database schema over elapsed time and change version. Data from Curino et al Curino_08. code
Figure 74. Percentage of patches submitted to WebKit (34,535 in total) transitioning between various stages of code review. Data from Baysal et al Baysal_13. code

# Software development projects

Figure 75. Commits within a particular hour and day of week for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code
Figure 76. Cone of uncertainty in estimated cost with constant accuracy and costs per time interval (top left), with 1% improvement in accuracy in each time interval (bottom left/right), and with three different spends per time interval, c(0.5*(0:30), 15+1.5*(1:30), 60+1:40), (right top/bottom). code
Figure 77. Estimated effort against actual effort (in hours). Data from Jørgensen Jorgensen_04b. code
Figure 78. Quoted bid price and estimated effort from 14 companies… Data from Anda et al Anda_09. code
Figure 79. Percentage difference in two estimates for the same six projects made by seven developers… Data from Grimstad et al Grimstad_07. code
Figure 80. Actual project duration against number of schedule estimates made for it. Data from Little Little_06. code
Figure 81. Distribution of effort (person hours) during the development of four engine control systems projects, plus non-project work and holidays, at Rolls-Royce. Data extracted from Powell Powell_01. code
Figure 82. Phase during which work on a given phase of development was actually performed. Data from Zelkowitz Zelkowitz_88. code
Figure 83. Average value assigned to requirements (red) and one standard deviation bands (blue) based on omitting one stakeholder’s value list. Data from Regnell et al Regnell_01. code
Figure 84. Number of requirements added/deleted/modified in 22 releases of a product containing eight features (upper) and total number of requirements against requirements changed for those eight features (lower). Data extracted from Felici Felici_04. code
Figure 85. Pagerank of the stakeholder nodes in the network created from the Open (green) and Closed (blue) stakeholder responses (values for each have been sorted). Data from Lim Lim_10. code
Figure 86. Average number of feature implementations started (blue) and their average duration (red); a 30 day rolling mean has been applied to both. Data kindly supplied by 7Digital 7Digital_12. code
Figure 87. Number of features whose implementation took a given number of elapsed workdays. Top first 650 days, bottom after 650 days. Green line is the fitted negative binomial distribution. Data kindly supplied by 7Digital 7Digital_12. code
Figure 88. Number of feature developments started on a given work day (red bug fixes, blue non-bug work, black ratio of two values; 20 day rolling mean bottom left, 50 day top right, 120 day bottom right). Data kindly supplied by 7Digital 7Digital_12. code

# Reliability

Figure 89. Input case on which a failure occurred, for a total of 500,000 inputs. Data from Dunham et al Dunham_86. code
Figure 90. Number of input cases processed before a given fault is experienced. Data from Dunham et al Dunham_86. code
Figure 91. Number of input cases processed before a given number of program failures is experienced; 25 replications. Data from Dunham et al Dunham_86. code
Figure 92. Time taken, in 10 distinct runs, to discover a thread safety violation in 22 different Java classes. Data kindly supplied by Pradel Pradel_12. code
Figure 93. Fraction of mutated programs, in various languages, that successfully compiled/executed/produced same output. Data from Spinellis et al Spinellis_12. code
Figure 94. Total number of failures per 30-day interval for each LANL system. Data from Los Alamos National Lab (LANL). code
Figure 95. Total number of failures for each node in the given LANL system. Data from Los Alamos National Lab (LANL). code
Figure 96. For systems 2 and 18, number of uptime intervals, binned into 10 hour intervals, red line is fitted negative binomial distribution. Data from Los Alamos National Lab (LANL). code
Margin Fault slip throughs for a development project at Ericsson (left column list when fault could have been detected, bottom row when fault was detected). Data from Hribar Hribar_08. code
Figure 97. Various test suite coverage measures and mutants killed in 300 or so Java projects; black line is a loess fit. Data from Gopinath et al Gopinath_14. code
Figure 98. Statement (triangles) and branch (stars) coverage achieved using a program’s test suite… Data from Marinescu et al Marinescu_14. code
Figure 99. Amount of source (millions of lines) in each version broken down by the version in which it first appears. Data extracted Massacci et al Massacci_11. code
Figure 100. Market share of Firefox versions between official release and end-of-support. Data from w3schools.com. code
Figure 101. Number of people with Internet access per 100 head of population in the developed world and the whole world. Data from ITU ITU_12. code
Figure 102. Amount of end-user usage of code originally written for Firefox version 1.0 by various other versions. Data extracted from Massacci et al Massacci_11. code

# Faults

Figure 103. Transition counts of the order in which five distinct faults were discovered in 50 runs of Program A2. Data from Nagel et al Nagel_82. code
Figure 104. Number of input cases that occurred before a particular fault was experienced by program A2; the list was sorted for each fault. Data from Nagel et al Nagel_82. code
Figure 105. Number of accesses to memory address blocks, per 100,000 instructions, executing gzip on two different inputs. Data from Brigham Young Brigham_Young via Feitelson. code
Figure 106. Number of reported incidents reported in each of 800 applications installed on over 120,000 desktop machines. Data from Lucente Lucente_15. code
Figure 107. Power analysis (50 and 10 runs at various p-values) of detecting a difference between two runs having a binomial distribution (runs needed to achieve power=0.8 at various p-values). code
Figure 108. Percentage of usability problems found by a given number of test subjects. Data extracted from Nielsen et al Nielsen_93. code
Figure 109. Survival rate of faults in Linux device drivers and other Linux subsystems… Data from Palix et al Palix_10b. code
Figure 110. Defects found against hours of testing… Data from Wood Wood_96. code
Figure 111. Percentage of reported problems having a given mean time to first problem occurrence (in months, summed over all installations of a product) for none products. Data from Adams Adams_84. code
Figure 112. Survival curve of the two most common warnings reported by Splint in Samba and Squid. Data from De Penta et al Di_penta_09. code
Figure 113. Reported faults against number of installations (upper) and age (lower)… Data from the "wheezy" version of Debian UDD_14. code
Figure 114. Number of various kinds of fault found during code review of nine implementations of the same specification and how located. Data extracted from Finifter Finifter_13b. code

# Source code

Figure 115. Boxplot of ratings given to snippets 1 to 50 by second year students (colors used to help distinguish boxplots for each snippet). code
Figure 116. Aggregated ranking of snippets by subjects in years 1 and 2 (red and black) and years 2 and 4 (black and blue). Snippets have been sorted by year 2 ranking. code
Figure 117. Correlation, using Kendall’s tau, between each subject and their corresponding year aggregate ranking. code
Figure 118. Number of files and lines of code in 3,782 projects on Sourceforge. Data from Herraiz Herraiz_08. code
Figure 119. Total number of C functions measured, their total unused parameters and two fitted models. Data from Jones <book Jones_??>. code
Figure 120. Occurrences of sequences of java.lang.StringBuilder methods called on the same object in 11 GB of Java bytecode. Data from Mendez et al Mendez_13. code
Figure 121. For each class the percentage of method sequences containing a given number of calls (in 11 GB of Java bytecode). Data from Mendez et al Mendez_13. code
Figure 122. Number of commits of a given length, in lines added/deleted to fix various faults in Linux file systems. Data from Lu et al Lu_13. code
Figure 123. "Worth estimate" for identifier visibility ordering preferences declarations within a Java class. Data from Biegel et al Biegel_12. code
Figure 124. "Worth estimate" for the kind of method activity attribute. Data from Biegel et al Biegel_12. code
Figure 125. Number of method calls to Java APIs and non-APIs in 6,286 Open source projects. Data from Lämmel et al Lammel_11. code
Figure 126. Percentage occurrence of values appearing as the most significant digit of floating-point, integer and hexadecimal literals in C source code. Data from Jones Jones_05a. code
Figure 127. Lines of code, Halstead’s volume and cyclomatic complexity of Linux version 2.6.9. Data from Israel et al Israeli_10. code
Figure 128. Number of feature constants against LOC for 40 large C programs and two fitted regression lines (red and green; blue is one confidence interval). Data from Liebig et al Liebig_10. code

# Stories told by data

Figure 129. Years of professional experience in a given language for experimental subjects. Data from Prechelt Prechelt_07. code
Figure 130. Plots of sample values having various visual patterns. code
Figure 131. Total number of lines of C code, in .c and .h files, having a given length, i.e., containing a given number of characters (upper) and tokens (lower). Data from Jones Jones_05a. code
Figure 132. Various measurements of work performed implementing the same functionality, number of lines of Haskell and C implementing functionality, CFP (COSMIC function points; based on user manual) and length of formal specification. Data kindly provided by Staples Staples_13. code
Figure 133. Effort, in hours (log scale), spent in various development phases of projects written in Ada (blue) and Fortran (red). Data from Waligora et al Waligora_95. code
Figure 134. Performance of experts (e) and novices (n) in a test driven development experiment. Data from Muller et al Muller_07. code
Figure 135. Correlations between pairs of attributes of 12,799 Github pull requests to the Homebrew repo, represented using colored ellipses. Data from Gousios et al Gousios_14. code
Figure 136. Correlations between pairs of attributes of 12,799 Github pull requests to the Homebrew repo, represented using pie charts abd shaded boxes. Data from Gousios et al Gousios_14. code
Figure 137. Hierarchical cluster of correlation between pairs of attributes of 12,799 Github pull requests to the Homebrew repo. Data from Gousios et al Gousios_14. code
Figure 138. Effort invested in project definition (as percentage of original estimate) against cost overrun (as percentage of original estimate). Data extracted from Gruhl Gruhl_9x. code
Figure 139. Relative clock frequency of cpus when first launched (1970 == 1). Data from Danowitz et al Danowitz_12. code
Figure 140. Year and age at which survey respondents started contributing to FLOSS, i.e., made their first FLOSS contribution. Data from Robles et al Robles_14. code
Figure 141. SPECint results, summed over all distinct values (upper) and summed within equal width bins (lower). Data from SPEC website SPEC_14. code
Figure 142. Kernel density plot of the number of computers having the same SPECint result. Data from SPEC SPEC_14. code
Figure 143. Number of commits containing a given number of lines of code made when making various categories of changes to the Linux filesystem code (upper) and a density plot of the same data (lower). Data from Lu et al Lu_13. code
Figure 144. Three commonly used kernel density smoothing functions: gaussian, rectangular and triangular. code
Figure 145. Developer estimated effort against actual effort (in hours), for various maintenance tasks, e.g., adaptive, corrective and perfective; upper as-is, middle jittered values and lower size proportional to the log of the number measurements. Data from Hatton Hatton_07. code
Figure 146. Number of installations of Debian packages against the age of the package; middle plot was created by smoothScatter and lower plot by contour. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code
Figure 147. Number of lines added to glibc each week. Data from González-Barahona et al Gonzalez-Barahona_14. code
Figure 148. Boxplot of time between a bug in Eclipse being reported and the first response to the report; right plot is notched. Data from Breu et al Breu_10. code
Figure 149. Violin plots (left using vioplot, right using beanplot) of time between bug being reported in Eclipse and first response to the report. Data from Breu et al Breu_10. code
Figure 150. Time taken for developers to debug various programs using batch processing or online (i.e., time-sharing) systems. Data kindly provided by Prechelt Prechelt_99a. code
Figure 151. Pairs of languages used together in the same GitHub project with connecting line width, color and transparency related to number of occurrences. Data kindly supplied by Bissyande Bissyande_13. code
Figure 152. References from one document to another in the Microsoft Server Protocol specifications. Data extracted by the author from the 2009 document release WSPP_15. code
Figure 153. Alluvial plot of relative prioritization order of selection and application of Github pull requests. Data from Gousios et al Gousios_15a. code
Figure 154. Intel Sandy Bridge L3 cache bandwidth in GB/s at various clock frequencies and using combinations of cores (0-3 denotes cores zero-through-three, 0,2,4 denotes the three cores zero, two and four). Data from Schone et al Schone_12. code
Figure 155. Contour plot of the number of sessions executed on a computer having a given processor speed and memory capacity. Data kindly provided by Thereska Thereska_10. code
Figure 156. Root source of 1,257 faults and where fixes were applied for 21 large safety critical applications. Data from Hamill et al Hamill_14. code
Figure 157. Ternary plots drawn with two possible visual aids for estimating the position of a point (red plus at x=0.1, y=0.35, z=0.55); axis names appear on the vertex opposite the axis they denote. code
Figure 158. Estimated market share of Android devices by brand and product, based on downloads from 682,000 unique devices in 2015. Data from OpenSignal OpenSignal_15. code
Figure 159. Variables having a given number of read accesses, given 25, 50, 75 and 100 total accesses, calculated from running the weighted preferential attachment algorithm (red), the smoothed data (blue) and a fitted exponential (green). code
Figure 160. Throughput when running the SPEC SDM91 benchmark on a Sun SPARCcenter 2000 containing 8 CPUs, with the predictions from three fitted queuing models. Data from Gunther Gunther_05. code
Figure 161. Illustration of the difference in cognitive effort needed to locate points differing by shape or color. code
Figure 162. The three, seven and twelve color palettes returned by calls to the diverge_hcl, sequential_hcl, rainbow_hcl and rainbow functions. code
Figure 163. Percentage share of the Android market by successive Android releases between 2010 and 2015. Data from Bidouille Bidouille_15. code
Figure 164. Values plotted using a linear (upper) and logarithmic (lower) x-axis. Data from Dunham et al Dunham_86. code
Figure 165. Illustration of U-shape created when y-axis values are a ratio calculated from x-axis values. code
Figure 166. Mean time to fail for systems of various sizes (measured in lines of code); linear y-axis left, log y-axis right. Data extracted from Figure 8.3 of Putnam et al Putnam_92. code
Figure 167. Alternative representation of numeric values in Table. Data from Scott Scott_16. code
Figure 168. What’s up doc? Not the fitted model you were expecting. Equations from White White_12. code

# Probability

Figure 169. Probability that three (red) or four (blue) consecutive false positive warnings occur in some total number of warnings (false positive rate appears on line). code
Figure 170. The relationship between words for tracts of trees in various languages. The interpretation given to words (boundary indicated by the zigzags) in one language may overlap that given in other languages. Adapted from DiMarco et al DiMarco_93.
Figure 171. Relationships between common discrete and continuous probability distributions.
Figure 172. Shapes of commonly encountered discrete probability distributions (upper to lower: Uniform, Geometric, Binomial and Poisson). code
Figure 173. Cumulative density plots of the discrete probability distributions in Figure. code
Figure 174. Commonly encountered continuous probability distributions (upper to lower: Uniform, Exponential, Normal, beta). code
Figure 175. Samples of randomly selected values drawn from the same normal distribution (left: 100 points in each sample, right 1,000 points in each sample). code
Figure 176. Reading rate for text printed using a serif (blue) and sans-serif (red) font, data has been normalised and displayed as a density. Data from Veytsman et al Veytsman_12. code
Figure 177. Probability, with 95% confidence, that shapiro.test correctly reports that samples drawn from various distributions are not drawn from a Normal distribution, and probability of an incorrect report when the sample is drawn from a Normal distribution. code
Figure 178. Number of conditionally compiled code sequences dependent on a given number of feature macros (red overwritten by blue: Linux, blue: FreeBSD). Data from Berger et al Berger_10. code
Figure 179. Percentage occurrence of statements for each of 100 or so C, C++ and Java programs, plotted as a density on the y-axis. Data from Zhu et al Zhu_15. code
Figure 180. A Cullen and Frey graph for the $3n+1$ program length data. Data kindly provided by van der Meulen van_der_Meulen_07. code
Figure 181. Number of 3n+1 programs containing a given number of lines and four distributions fitted to this data. Data kindly provided by van der Meulen van_der_Meulen_07. code
Figure 182. A zero-truncated Negative Binomial distribution fitted to the number of features whose implementation took a given number of elapsed workdays; first 650 days used. Data kindly provided by 7digital 7Digital_12. code
Figure 183. Percentage of function definitions in embedded applications, the SPECint95 benchmark, and the translated form of C source benchmark programs declared to have a given number of parameters. Data for embedded and SPECint95 kindly supplied by Engblom Engblom_99a, C book data from Jones Jones_05a. code
Figure 184. Density plot of MPI micro-benchmark runtime performance for calls to MPI_Scan with 10,000 Bytes (upper) and to MPI_Allreduce with 1,000 Bytes (lower). Data kindly supplied by Hunold Hunold_14. code
Figure 185. Mixture model fitted by the normalmixEM function to the performance data from calls to MPI_Allreduce. Data kindly supplied by Hunold Hunold_14. code
Figure 186. Density plot of accesses to one article on Slashdot, in minutes since its publication. The distinct Normal distributions (colored and fitted to the log of the data) contained in the mixture models fitted by the REBMIX (upper) and normalmixEM (lower) functions. Data kindly supplied by Kaltenbrunner Kaltenbrunner_07. code
Figure 187. Cumulative probability distribution of files size (red) and of number of bytes occupied in a file system (blue). Data from Irlam Irlam_93. code
Figure 188. Graph of available state transitions for Alaris volumetric infusion pump (the button presses that cause transitions between states are not shown). Data kindly supplied by Oladimeji Oladimeji_08. code
Figure 189. Discrete-time Markov chain for created/modified/deleted status of Linux kernel files at each major release from versions 2.6.0 to 2.6.39. Data from Tarasov Tarasov_12. code
Figure 190. Directed graph of emails between FreeBSD and OpenBSD developers, plus a few people involved in both discussions, with developers who sent/received less than four emails removed. Data from Canfora et al Canfora_11. code
Figure 191. Expected probability of a single instance (y-axis) against the probability of a measured struct type having grouped member types (x-axis); when both probabilities are the same points will be along the blue line. Data from Jones Jones_09b. code

# Statistics for software engineering

Figure 192. Example of a sample drawn from a population. code
Figure 193. Date of introduction of a cpu against its commercial lifetime. Data from Culver Culver_10. code
Figure 194. A population of items having one of three colors and three strata sampled from it. code
Figure 195. Power consumed by three SERT benchmark programs at various levels of system load; crosses at 2% load intervals, lines based on 10% load intervals. Data kindly provided by Kistowski Kistowski_15. code
Figure 196. Distribution of 4,000 sample means for two sample sizes drawn from exponential (left), lognormal (center) and Pareto (right) distributions, vertical lines are 95% confidence bounds. The blue curve is the Normal distribution predicted by theory. code
Figure 197. Mean (red) and standard deviation (grey lines; they are not symmetrical because of the log scaling) of samples of 3 items drawn from a population of 1,000 items (blue line mean, green line standard deviation). Data kindly provided by Chen Chen_12. code
Figure 198. Density plot of mean of samples containing 3 or 12 items randomly selected from a data set of 1,000 items; process repeated 1,000 times for each sample size. Data kindly provided by Chen Chen_12. code
Figure 199. Number of commits to glibc for each day of the week, for the years from 1991 to 2012. Data from González-Barahona et al Gonzalez-Barahona_14. code
Figure 200. A Normal distribution with mean=4 and variance=8 and a Chi-squared distribution with four degrees of freedom having the same mean and variance (the vertical lines are at the distributions' median value). code
Figure 201. Density plot of execution time of 1,000 input data sets, with lines marking the mean, median and mode. Data kindly supplied by Chen Chen_12. code
Figure 202. Impact of serial correlation, AR(1) in this example, on the calculated mean (upper) and standard deviation (lower) of a sample (the legends specify the amount of serial correlation). code
Figure 203. Occurrence of sample median and mean values for 1,000 samples drawn from a binomial distribution. code
Figure 204. A contaminated normal, values drawn from two normal distributions with 10% of values drawn from a distribution having a standard deviation five times greater than the other. code
Figure 205. Regression model (red line; pvalue=0.02) fitted to the number of correct/false security code review reports made by 30 professionals; blue lines are 95% confidence intervals. Data from Edmundson et al Edmundson_13. code
Figure 206. Bootstrapped regression lines fitted to random samples of the number of correct/false security code review reports made by 30 professionals. Data from Edmundson et al Edmundson_13. code
Figure 207. Kernel density plot, with 95% confidence interval, of the number of computers having the same SPECint result. Data from SPEC SPEC_14. code
Figure 208. The four related quantities in the design of experiments. code
Figure 209. Examples of the impact of population prevalence, statistical power and p-value on number of false positives and false negatives. code
Figure 210. Visualization of Cohen’s $d$ for two normal distributions having different means and the same standard deviation (two left) and both different (right). code
Figure 211. The impact of differences in mean and standard deviation on the overlap between two populations ($\alpha$: probability of making a false positive error, and $\beta$: probability of making a false negative error). code
Figure 212. The power of a statistical test at detecting that a difference exists between the mean value of two sample drawn from two populations, both having a Normal distribution. code

# Regression modeling

Figure 213. Relationship between data characteristics (edge labels) and applicable techniques (node labels) for building regression models.
Figure 214. Total lines of source code in FreeBSD by days elapsed since the project started (in 1993). Data from Herraiz Herraiz_08. code
Figure 215. Estimated cost and duration of 73 large Dutch federal IT projects, along with fitted model and 95% confidence intervals. Data from Kampstra et al Kampstra_09. code
Figure 216. Number of updates and fixes in each Linux release between version 2.6.11 and 3.2. Data from Corbet et al Corbet_12. code
Figure 217. The number of commits made and the number of contributing developers for Linux versions 2.6.0 to 3.12. The green line in the right plot is the regression model fitted by switching the x/y values. Data from Kroah-Hartman Kroah-Hartman_14. code
Figure 218. Effort/Size of various projects and regression lines fitted using Effort as the response variable (red, with green 95% confidence intervals) and Size as the response variable (blue). Data from Jørgensen et al <book Jorgensen_0?>. code
Figure 219. Lines of code in every initial release (i.e., excluding bug-fix versions of a release) of the Linux kernel since version 1.0, along with fitted straight line (upper) and quadratic (lower) regression models. Data from Israeli et al Israeli_10. code
Figure 220. Actual (left of vertical line) and predicted (right of vertical line) total lines of code in Linux at a given number of days since the release of version 1.0, derived from a regression model built from fitting a cubic polynomial to the data (dashed lines are 95% confidence bounds). Data from Israeli et al Israeli_10. code
Figure 221. Number of classes in the Groovy compiler at each release, in days since version 1.0. Data From Vasa Vasa_10. code
Figure 222. For each distinct language, the number of lines committed on Github and the number of questions tagged with that language. Data from Kunst Kunst_13. code
Figure 223. Percentage of vulnerabilities detected by developers working a given number of years in security. Data extracted from Edmundson et al Edmundson_13. code
Figure 224. Hours to develop software for 29 embedded consumer products and the amount of code they contain, with fitted regression model and loess fit (yellow). Data from Fenton el al Fenton_08. code
Figure 225. Points remaining after removal of overly influential observations, repeatedly applying Cook’s distance and Studentized residuals. Data from Fenton el al Fenton_08. code
Figure 226. Points remaining after removal of overly influential observations, also taking into account the Bonferroni p-value of the Studentized residuals; the line shows the fitted model and 95% confidence interval (loess fit in yellow). Data from Fenton el al Fenton_08. code
Figure 227. influenceIndexPlot for the model having the fitted line shown in Figure. Data from Fenton el al Fenton_08. code
Figure 228. Number of medical devices reported recalled by the US Food and Drug Administration, in two week bins. Upper: fitted straight line and confidence bounds, with loess fit (green); Lower: straight line (purple) fitted after two outliers replaced by mean and original fit (red). Data from Alemzadeh et al Alemzadeh_13. code
Figure 229. influenceIndexPlot of data from Alemzadeh et al Alemzadeh_13. code
Figure 230. Two fitted straight lines and confidence intervals, one up to the end of 2010 and one after 2010. Data from Alemzadeh et al Alemzadeh_13. code
Figure 231. Results from various studies of software requirements function points counted using COSMIC and FPA; lines are loess fits to studies based on industry and academic counters. Data from Amiri et al Amiri_11. code
Figure 232. Five different equations fitted to the Embedded subset of the COCOMO 81 data before influential observation removal (upper) and after influential observation removal (lower). Data from Boehm Boehm_81. code
Figure 233. Anscombe data sets with Pearson correlation coefficient, mean, standard deviation, and line fitted using linear regression. Data from Anscombe Anscombe_73. code
Figure 234. Residual of the straight line fit to the Linux growth data analysed in Figure (upper) and data+straight line fit (red) and loess fit (blue). Data from Israeli et al Israeli_10. code
Figure 235. Array element assignment benchmark compiled with gcc using the O0 (upper) and O3 (lower) options (measurements were grouped into runs of 2,000 executions). Data from Flater et al Flater_13. code
Figure 236. Number of installations of Debian packages against the age of the package, plus fitted model and loess fit. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code
Figure 237. Quadratic relationship with various amounts of added noise fitted using a quadratic and exponential model. code
Figure 238. Author workload against number of activity types per author (upper) and ratio test (lower). Data from Vasilescu et al Vasilescu_12. code
Figure 239. Change points detected by cpt.mean, upper using method="AMOC" and lower using method="PELT". Data from Alemzadeh et al Alemzadeh_13. code
Figure 240. Number of flags (y-axis jittered) used to control the selection of optional features in system containing a total number of features, loess curve (red), regression line (green). Data from Berger et al Berger_12. code
Figure 241. Monthly unit sales (in thousands) of 4-bit microprocessors. Data kindly supplied by Turley Turley_02. code
Figure 242. Fitted regression line to points (in red) and 3-D illustration of assumed Normal distribution of errors. code
Figure 243. Number of vulnerabilities detected by professional developers with web security review experience; upper: technically correct plot of model fitted using a Poisson distribution, lower: easier to interpret curve representation of fitted regression models assume error has a Poisson distribution (continuous lines) or a Normal distribution (dashed lines). Data extracted from Edmundson Edmundson_13. code
Figure 244. Number of functions containing a given number of break statements and a fitted Negative Binomial distribution. Data from Jones Jones_05a. code
Figure 245. Code review meeting duration for a given number of non-comment lines of code. Fitted regression model, assuming errors have a Gamma distribution (red, with confidence interval in blue) or a Normal distribution (green). Data from Porter et al Porter_98. code
Figure 246. Number of APIs used in Java programs containing a given number of lines and three fitted models. Data from Starek Starek_10. code
Figure 247. Yearly development cost and line of Fortran code delivered to the US Air Force between 1962 and 1984; with fitted regression models. Data extracted from NeSmith NeSmith_86. code
Figure 248. Maintenance task effort and lines of code added+updated, with fitted regression model (red) and SIMEX adjusted for 10% error (blue). Data from Jørgensen Jorgensen_95. code
Figure 249. Regression modeling 0/1 data with a straight line and a logistic equation. code
Figure 250. ROC curve for the data listed in Table. code
Figure 251. Percentage of mutants killed at various percentage of path coverage for 300 or so Java projects; fitted Beta (red) and glm (blue) regression models. Data from Gopinath et al Gopinath_14. code
Figure 252. SPECint 2006 performance results for processors running at various clock rates, memory chip frequencies and processor family. Data from SPEC SPEC_14. code
Figure 253. Component+residual plots for three explanatory variables in a fitted SPECint model. code
Figure 254. Individual contribution of each explanatory variable to the response variable in a quadratic model of SPECint performance. code
Figure 255. Estimated and actual effort broken down by communication frequency, along with individually fitted straight lines. Data from Moløkken-Østvold et al Molokken_Ostvold_07. code
Figure 256. Illustration of the shared and non-shared contributions made by two explanatory variables to the response variable Y. code
Figure 257. pairs plot of lines added/modified/removed, growth and number of files and total lines in versions 2.6.0 through 3.9 of the Linux kernel. Data from Kroah-Hartman Kroah-Hartman_14. code
Figure 258. Example plots of functions listed in Table. These equations can be inverted, so they start high and go down. code
Figure 259. Time to execute a computational biology program on systems containing processors with various L2 cache sizes. Data kindly provided by Hazelhurst Hazelhurst_10. code
Figure 260. A logistic equation fitted to the lines of code in every non-bugfix release of the Linux kernel since version 1.0. Data from Israel et al Israeli_10. code
Figure 261. Predictions by logistic equations fitted to Linux SLOC data, using subsets of data up to 2900, 3650, 4200 number of days and all days since the release of version 1.0. Data from Israel et al Israeli_10. code
Figure 262. Increase in areal density of hard disks entering production over time. Data from Grochowski et al Grochowski_12. code
Figure 263. Lines of code in the GNU C library against days since 1 January 1990. Data from González-Barahona Gonzalez-Barahona_14. code
Figure 264. Number of failing programs caused by unique faults in gcc (upper) and SpiderMonkey (lower). Fitted model in green, with two exponential components in red and blue. Data kindly provided by Chen Chen_13. code
Figure 265. Power law (red) and exponential (blue) fits to feature macro usage in 20 systems written in C; fail to reject p-value for 20 systems is 0.64. Data from Queiroz et al Queiroz_15. code
Figure 266. Power consumption of six different Intel Core i5-540M processors running at various frequencies; colored lines denote fitted regression models for each processor. Data from Balaji et al Balaji_12. code
Figure 267. Example showing the three ways of structuring a mixed effects model, i.e., different intersections/same slope (upper), same intersection/different slopes (middle) and different intersections/slopes (lower). code
Figure 268. Confidence intervals, 95%, for within-subject intercept and slope (right plots) of mixed-effect models in the adjacent code. code
Figure 269. The three components of the hourly rate of commits, during a week, to the Linux kernel source tree; components extracted from the time series by stl. Data from Eyolfson et al Eyolfson_11. code
Figure 270. Autocorrelation of number of defects found on a given day, for development project C. Data kindly provided by Buettner Buettner_08. code
Figure 271. Autocorrelation of two AR models (upper plots) and two MA models (lower plots). code
Figure 272. Partial autocorrelation of same two AR models (upper plots) and two MA models (lower plots) shown in Figure. code
Figure 273. Autocorrelation of indentation of source code written in various languages. Data from Hindle et al Hindle_08. code
Figure 274. Number of features started for each day and fitted regression trend line (left) and number of features after subtracting the trend (right), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code
Figure 275. Autocorrelation (left) and partial autocorrelation (right) of the number of features started on a given day (after differencing the log transformed data), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code
Figure 276. Predicted daily difference in the number of new feature starts (red) and 95% confidence intervals (blue). Data kindly supplied by 7Digital 7Digital_12. code
Figure 277. Time series whose values are uncorrelated (upper), but whose squared values are correlated (lower); see code for generation process. code
Figure 278. Cross correlation of feature release ‘size’ (upper non-bugfix releases, lower all releases) and date when bugs are prioritised. Data kindly supplied by 7Digital 7Digital_12. code
Figure 279. Estimated staff working on a project during every week. Data from Buettner Buettner_08. code
Figure 280. Market share of Firefox version 3.0 fitted using loess regression with various values of the span option. Data from W3Counter W3Counter_14. code
Figure 281. Cross-correlation of source lines added/deleted per week to the glibc library. Data from González-Barahona Gonzalez-Barahona_14. code
Figure 282. Visualization of alignment between weekly time series of lines code in NetBSD (blue) and FreeBSD (red). Data from Herraiz Herraiz_08 code
Figure 283. Effort distribution (person hours) over the eight main tasks of a development project at Rolls-Royce and a hierarchical clustering of each task effort time series based on pair-wise correlation and Euclidean distance metrics. Data extracted from Powell Powell_01. code
Figure 284. Two commonly used hazard functions; Weibull is monotonic (always increases, decreases or remains the same) and Lognormal which can increase and then decrease. code
Figure 285. Observation period with events inside and outside the study period. code
Figure 286. The Kaplan-Meier curve for survivability of new releases: (blue) ETPs using only official APIs, (blue) ETPs calling internal APIs (red); dotted lines are 95% confidence intervals. Data from Businge Businge_13. code
Figure 287. The Kaplan-Meier curve for survivability of ETPs ability to be built using SDK released in subsequent years: (blue) ETPs using only official APIs, (red) ETPs calling internal APIs; dotted lines are 95% confidence intervals, with plus signs, +, indicating censored data. Data from Businge Businge_13. code
Figure 288. Kaplan-Meier curves for time-to-fix…. Data from Arora et al Arora_10. code
Figure 289. Survival curve after adjustment for explanatory variables… code
Figure 290. Cumulative incidence curves for problems reported by the splint tool in Samba and Squid (time is measured in number of snapshot releases). Data from Di Penta et al Di_penta_09. code
Figure 291. Rose diagram of number of commits in each 3 hour period of a day for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code
Figure 292. The Cartwright (red; dcarthwrite), wrapped Cauchy (green; dwrappedcauchy) and wrapped von Mises (blue; dvonmises) circular probability distributions for various values of their parameters. code
Figure 293. Asymmetric extended wrapped forms of the Cardioid (upper), von Mises (middle) and Cauchy (lower) probability distributions for various values of their parameters. code
Figure 294. Number of commits (upper) and number of commits in which a fault was detected (lower) by hour of day of the commit, for Linux. Data from Eyolfson et al Eyolfson_14. code
Figure 295. Number of commits per hour for weekdays and fitted model (upper) and number of commits in which a fault was detected (lower), for Linux. Data from Eyolfson et al Eyolfson_14. code
Figure 296. Number of commits per hour for each weekday, fitted using $\cos(...\cos...)$ (upper) and $\cos(...\cos+\sin...)$ (lower), for Linux; in both cases the fitted fault model (red) has been rescaled to allow comparison. Data from Eyolfson et al Eyolfson_14. code
Figure 297. Application source lines against percentage of covered lines achieved by both Human & Dynodroid tests, by only by Dynodroid tests and only by Human tests. Data from Machiry et al Machiry_13. code
Figure 298. Percentage of source lines covered by both Human & Dynodroid tests, by only by Dynodroid tests and only by Human tests; fitted regression line and prediction points for various total source lines, red plus. Data from Machiry et al Machiry_13. code

# Other techniques

Figure 299. Volume of unit sphere in 1 to 50 dimensions, e.g., sphere has volume $\frac43pi$ in three dimensions. code
Figure 300. Top levels of the decision tree built from the reopened fault data. Data from Shihab et al Shihab_10a. code
Figure 301. A Bertin plot for items included in the same data structure as ‘Antibiotics used’, for each subject, after reordering by seriate. Data from Jones Jones_09b. code
Figure 302. A visualization of the Robinson matrix based on number of times pairs of items co-occur in the same data structure (the closer to the diagonal the more often they occur together). Data from Jones Jones_09b. code

# Experiments

Figure 303. Time taken, by the same person, to implement 12 algorithms from the Communications of the ACM, with four iteration of the implementation process. Data from Zislis Zislis_73. code
Figure 304. Time taken to transfer and multiply 2-dimensional matrices of various sizes on a GTX 480 GPU. Data kindly supplied by Gregg and Hazelwood Gregg_11. code
Figure 305. Relative performance (y-axis) of libraries optimized to run on various processors (x-axis). Data from Bird Bird_10. code
Figure 306. Number of integer constants having the lexical form of a decimal-constant (the literal 0 is also included in this set) and hexadecimal-constant that have a given value. Data from Jones Jones_05a. code
Figure 307. One and two-sided significance testing. code
Figure 308. A cube plot of three configuration factors and corresponding benchmark results (blue) from Memory table experiment. Data from Citron et al Citron_03b. code
Figure 309. Design plot showing the impact of each configuration factor on the performance of Memo table on benchmark performance. Data from Citron et al Citron_03b. code
Figure 310. Interaction plot showing how cint changes with size for given values of associativity and mapping. Data from Citron et al Citron_03b. code
Figure 311. Number of Reflection benchmark results achieving a given score, reported for GTX 970 cards from three third-party manufacturers. Data extracted from UserBenchmark.com. code
Figure 312. Density plots of project bids submitted by companies before/after see a requirements document. Data from Jørgensen et al Jorgensen_04c. code
Figure 313. Density plot of task implementation estimates: with no instructions (red) and with instruction on what to do (blue). Data from Jørgensen el al Jorgensen_04. code
Figure 314. Examples of correlation between samples of two value pairs, plotted on x and y axis. code
Figure 315. Number of software faults having a given consequence, based on an analysis of faults in Cassandra. Data from Gunawi et al Gunawi_14. code
Figure 316. Performance and rental cost of early computers, with straight line fits for a few years. Data from Knight Knight_66. code
Figure 317. Feature size, in Silicon atoms, of microprocessors. Data from Danowitz et al Danowitz_12. code
Figure 318. Maximum number of records sorted in 1 minute and using 1 penny’s worth of system time (upper). SPEC2006 integer benchmark results (lower). Data from Gray et al Gray_14 and SPEC SPEC_14. code
Figure 319. Total system power consumed when sorting 10, 20, 30, 40, 50 million integers (colored pluses) using three techniques running on the same processor at different clock frequencies. Data from Götz et al Gotz_14. code
Figure 320. Power consumed by 10 Amtel SAM3U microcontrollers at various temperatures when sleeping or running. Data from Wanner et al Wanner_10. code
Figure 321. Power spectrum of electrical power consumed by an app running on a ???. Data from Saborido et al Saborido_15. code
Figure 322. Read bandwidth at various offsets for new disks sold in 2002 (upper) and 2006 (lower). Data kindly provided by Krevat Krevat_13. code
Figure 323. Average power consumed by one server’s CPU (four Pentium 4 Xeons; red) and memory (8 GB PC133 DIMMs; blue) running the SPEC CPU2006 benchmark (upper) and breakdown by system component when executing various programs. Data from Bircher Bircher_10. code
Figure 324. FFT benchmark executed 2,048 times followed by system reboot, repeated 10 times. Data kindly provided by from Kalibera_05. code
Figure 325. Percentage change, relative to no environment variables, in perlbench performance as characters are added to the environment. Data extracted from Mytkowicz et al Mytkowicz_08. code
Figure 326. Changes in SPEC CPU2006 benchmark performance caused by cache and memory bus contention for one dual processor Intel Xeon E5345 system. Data kindly provided by Babka Babka_12. code
Figure 327. Execution time of 330.art_m, an OpenMP benchmark program, using different compilers, number of threads and setting of thread affinity. Data kindly provided by Mazouz Mazouz_13. code
Figure 328. Access times when walking through memory using three fixed stride patterns (i.e., 32, 64 and 128 bytes) on a quad-core Intel Xeon E5345; grey lines at one standard deviation. Data kindly provided by Babka Babka_09. code
Figure 329. Performance variation of programs from the Talos benchmark run on original OS and a stabilised OS. Data from Larres Larres_12. code
Figure 330. Operations per second of a file-sever mounted on one of ext2, ext3, rfs and xfs filesystems (same color for each filesystem) using various options. Data kindly supplied by Huang Zhou_12. code
Figure 331. Percentage change in SPEC number, relative to version 4.0.4, for 12 programs compiled using six different versions of gcc (compiling to 64-bits with the O3 option). Data from Makarow Makarow_14. code
Figure 332. Execution time of xy file compressor, compiled using gcc using various optimization options, running on various systems (lines are mean execution time when compiled using each option). Data kindly supplied by Petkovich de_Oliveira_13. code
Figure 333. Execution time of Perlbench, from SPEC benchmark, on six systems, when linked in three different orders and address randomization on/off. Data kindly supplied by Reidemeister de_Oliveira_13. code
Figure 334. Performance of PassMark memory benchmark on 783 Intel Core i7-3770K systems; lower plot created by trimming 10% of values from the ends of what appears in the upper plot. Data kindly supplied by David Wren PassMark_14. code
Figure 335. Ubench cpu performance on small (upper) and large (lower) EC2 instances, Europe in red and US in green. Data kindly provided by Dittrich Schad_10. code
Figure 336. Lines of code that 101 professional developers, with a given number of years experience, estimate they have written. Data from Jones Jones_06aJones_08aJones_09b. code

# Overview of R

Figure 337. Plot produced by hello_world.R program. code
Figure 338. The unique bytes per window (256 bytes wide) of a pdf file. code

# Data preparation

Figure 339. Screen height and width reported by 682,000 unique devices that downloaded an App from OpenSignal in 2015 (upper), values switched to ensure height is always the largest value (lower). Data from OpenSignal OpenSignal_15. code
Figure 340. Number of reported vulnerabilities, per day, in the US National Vulnerability Database for 2003. Data from the National Vulnerability Database NVD_14. code
Figure 341. Percentage occurrence of the first digit of hexadecimal numbers in C source and estimated from Google book data. Data from Jones Jones_05a and Michel et al Michel_11. code
Figure 342. Number of processes executing for a given amount of time, with measurements expressed using two and six significant digits. Data from Feitelson Feitelson_14. code