Read me 1st

All of the figures from the book, plus links to the code on Github and a Google search for the paper from which the data was obtained.

Introduction

caption=
Figure 1. Total cost of one million computing operations over time. Data from Nordhaus Nordhaus_01. code
caption=
Figure 2. Storage cost, in US dollars per Mbyte, of mass market technologies over time. Data from McCallum McCallum_16. code
caption=
Figure 3. Maximum speed achieved by vehicles over the surface of the Earth and in the air, over time. Data from Lienhard Lienhard_06. code
caption=
Figure 4. Growth of transport and product distribution infrastructure in the USA (underlying data is measured in miles). Data from Grübler et al Grubler_91. code
caption=
Figure 5. Market capitalization of IBM, Microsoft and Apple (top) and expressed as a percentage of the top 100 listed US tech companies (bottom). Data extracted from the Economist website Economist_15. code
caption=
Figure 6. Unit sales of processors used in various ecosystems. Data from Gordon Gordon_87 (mainframes and minicomputers) and Hilbert et al Hilbert_11 (post 1985 hardware). code
caption=
Figure 7. Total investment in tangible and intangible assets by UK companies, based on their audited accounts. Data from Goodridge et al Goodridge_14. code
caption=
Figure 8. Billions of dollars of worldwide semiconductor sales per month. Data from World Semiconductor Trade Statistics WSTs_16. code
caption=
Figure 9. Changing habits in men’s facial hair. Data from Robinson Robinson_76. code
caption=
Figure 10. Number of papers, in each year between 1987 and 2003, associated with a particular IT topic. The E-commerce paper count peaks at 1,775 in 2000 and in 2003 is still off the scale compared to other topics. Data kindly provided by Wang Wang_10. code
caption=
Figure 11. Normal distribution with total percentage of values enclosed within a given number of standard deviations. code

Human cognitive characteristics

caption=
Figure 12. Unless cognition and the environment in which it operates closely mesh together, no problems are solved; the blades of a pair of scissors need to closely mesh for cutting to occur. code
caption=
Figure 13. The assumption of light shining from above creates the appearance of bumps and pits. Could be more convincing hemispheres with light shining from above and below… code
caption=
Figure 14. Probability that rat N1 will press a lever a given number of times before pressing a second lever to obtain food, when the target count is 4, 8, 12 and 16. Data extracted from Mechner Mechner_58. code
caption=
Figure 15. Boy/girl (aged 11-12 years) verbal reasoning, quantitative reasoning, non-verbal reasoning and mean CAT score over the three tests; each stanine band is 0.5 standard deviations wide. Data from Strand et al Strand_06. code
caption=
Figure 16. Rotate text in the real world, by tilting the head, or in the mind? code
caption=
Figure 17. Two objects paired with another object that may be a rotated version. Based on Shepard et al Shepard_71. code
caption=
Figure 18. Error rate, with standard error, for the left/right hand in a study of the SNARC effect. Data from Nuerk et al Nuerk_05. code
caption=
Figure 19. Structure of mammalian long-term memory subsystems; brain areas in red. Based on Squire et al Squire_15.
caption=
Figure 20. Percentage correct answers to questions about binary operator precedence against occurrence in source code. Data from Jones Jones_06a. code
caption=
Figure 21. Response time (left axis) and error percentage (right axis) on reasoning task with given number of digits held in memory. Data extracted from Baddeley Baddeley_09. code
caption=
Figure 22. Major components of working memory: working memory in yellow, long-term memory in orange. Based on Baddeley Baddeley_12. code
caption=
Figure 23. Yes/no response time (in milliseconds) as a function of the number of digits held in memory. Data extracted from Sternberg Sternberg_69. code
caption=
Figure 24. Parse tree of a sentence with no embedding, upper "S 1", and a sentence with four degrees of embedding, lower "S 4". Based on Miller et al Miller_64. code
caption=
Figure 25. Sequencing errors (as percentage) after interruptions of various length (red), including 95% confidence intervals, normal sequence error rate in green; lines are fitted model predictions. Data from Altmann et al Altmann_17. code
caption=
Figure 26. Semantic memory representation of alphabetic letters (the numbers listed along the top are place markers and are not stored in subject memory). Readers may recognize the structure of a nursery rhyme in the letter sequences. Derived from Klahr Klahr_83. code
caption=
Figure 27. Probability of correct recall of words by serial presentation order (each word visible for 1 or 2 seconds, last digit in legend). Data extracted from Murdoch Murdoch_62. code
caption=
Figure 28. Time taken to solve the same jig-saw puzzle 35 times, followed by a two-week interval and then another 35 times, with power law and exponential fits. Data extracted from Alteneder Alteneder_35. code
caption=
Figure 29. Completion times of eight solo (upper) and eight pairs (lower) for each implementation round, along with fitted equation…. Data kindly provided by Lui Lui_06. code
caption=
Figure 30. Subjects belief response curves for positive weak&endash; strong, negative weak&endash; strong, and positive&endash; negative evidence. Based on Hogarth et al Hogarth_92. code
caption=
Figure 31. Country boundaries distort judgement of relative city locations. Based on Stevens et al Stevens_78.
caption=
Figure 32. Orthogonal representation of shape, color and size stimuli. Based on Shepard Shepard_61.
caption=
Figure 33. The six unique configurations of selecting four times from eight possibilities, i.e., it is not possible to rotate one configuration into another within these six configurations. Based on Shepard Shepard_61.
caption=
Figure 34. Percentage of correct answers given by one subject, against boolean-complexity of category, colored by number of positive cases needed to define the category. Data kindly provided by Feldman Feldman_00. code
caption=
Figure 35. The Berlin and Kay Berlin_69 language color hierarchy. The presence of any color term in a language implies the existence, in that language, of all terms below it. Papuan Dani has two terms (black and white), while Russian has eleven (Russian may also be an exception in that it has two terms for blue.) code
caption=
Figure 36. Cup- and bowl-like objects of various widths (ratios 1.2, 1.5, 1.9, and 2.5) and heights (ratios 1.2, 1.5, 1.9, and 2.4). The percentage of subjects who selected the term cup or bowl to describe the object they were shown (the paper did not explain why the figures do not sum to 100%). Based on Labov Labov_73. code
caption=
Figure 37. A commercial event involving a buyer, seller, money, and goods; as seen from the buy, sell, pay, or charge perspective. Based on Fillmore Fillmore_77. code
caption=
Figure 38. Lines of code correctly recalled after a given number of 2 minute memorization sessions; upper plot actual program, lower plot line order scrambled. Data extracted from McKeithen et al McKeithen_81. code
caption=
Figure 39. Examples of features that may be preattentively processed (parallel lines and the junction of two lines are the odd ones out). Based on Ware Ware_00.
caption=
Figure 40. Continuity&emdash; upper left plot is perceived as two curved lines; Closure&emdash; when the two perceived lines are joined at their end (upper right), the perception changes to one of two cone-shaped objects; Symmetry and parallelism&emdash; where the direction taken by one line follows the same pattern of behavior as another line; Proximity&emdash; the horizontal distance between the dots in the lower left plot is less than the vertical distance, causing them to be perceptually grouped into lines (the relative distances are reversed in the right plot); Similarity&emdash; a variety of dimensions along which visual items can differ sufficiently to cause them to be perceived as being distinct; rotating two line segments by 180°ree; does not create as big a perceived difference as rotating them by 45°ree;; TODO look good. code
caption=
Figure 41. Perceived grouping of items on a line may be by shape, color or proximity. Based on kubovy et al kubovy_08. code
caption=
Figure 42. Examples of unique items among visually similar items. Those at the top include an item that has a distinguishing feature (a vertical line or a gap); those underneath them include an item that is missing this distinguishing feature. Based on displays used by Treisman et al Treisman_85. code
caption=
Figure 43. The foveal, parafoveal and peripheral vision regions when three characters visually subtend 3°ree;. Based on Schotter et al Schotter_12. code
caption=
Figure 44. Local context can change the interpretation given to the surrounding symbols. code
caption=
Figure 45. Example object layout and the corresponding ordered tree produced from the answers given by one subject. Data extracted from McNamara et al McNamara_89. code
caption=
Figure 46. Heat map of one subject’s cumulative fixations (black dots) on a screen image. Data kindly provided by Ali Ali_12. code
caption=
Figure 47. The four cards used in the Wason selection task. Based on Wason Wason_68. code
caption=
Figure 48. Probability a subject will successfully distinguish a difference between the number of dots displayed and a specified target number (x-axis is the difference between these two values). Data extracted from van Oeffelen et al van_Oeffelen_82. code
caption=
Figure 49. Line locations chosen for the numeric values seen by each of four subjects; color of fitted loess line changes at one million boundary. Data kindly provided by Landy Landy_16. code
caption=
Figure 50. Number of errors, in 132 simple multiplication trials (e.g., $3\times7$), upper plot shows operand values (a loess fit in yellow) and lower plot result value (points where both operands have the same value are in blue). Data from Campbell Campbell_97. code
caption=
Figure 51. One subject’s response time over successive blocks of command line trials and fitted loess (in green). Data kindly provided by Remington Remington_16. code
caption=
Figure 52. Risk neutral (green, $u(w)=w$), risk loving (red, quadratic) and risk averse (blue, square-root) utility functions. code
caption=
Figure 53. Subjects' estimate of their ability (x-axis) to correctly answer a question and actual performance in answering on the left scale. The responses of a person with perfect self-knowledge is given by the solid line. Data extracted from Lichtenstein et al Lichtenstein_77. code
caption=
Figure 54. Each row shows a scaled version of the three stripes, along with actual lengths in inches, from which subjects were asked to select the longest. Based on Asch Asch_56. code

Economics

caption=
Figure 55. Company revenue (in millions of dollars) against total software development costs. Data from Mulford et al Mulford_16. code
caption=
Figure 56. Average Return On Invested Capital of various U.S. industries between 1992-2006. Data from Porter Porter_08. code
caption=
Figure 57. Ratio of actual to estimated hours of effort to enhance an existing product, for 25 versions of one application. Data from Huijgens et al Huijgens_16. code
caption=
Figure 58. Accounting practice for breaking down income from sales… code
caption=
Figure 59. Average effort (in days) used to fix a defect detected in a given phase (x-axis) that had been introduced in an earlier phrase (colored lines), introduced in an earlier phase (total of 38,120 defects in projects at Hughes Aircraft). Data extracted from Willis et al Willis_98. code
caption=
Figure 60. Months of developer effort needed to produce systems containing a given number of lines of code… Data from Gayek et al Gayek_04. code
caption=
Figure 61. Example supply and demand curves. code
caption=
Figure 62. Rates at which product sales are made on Gumroad at various prices; lines join prices that differ in 1¢s;, e.g., $1.99 and $2. Data from Nichols Nichols_13. code
caption=
Figure 63. Growth of Github users during its first 58 months. Data from Irving Irving_16. code
caption=
Figure 64. Percentage of sales closed in a given week of a quarter, with average discount given. Data from Larkin Larkin_13. code
caption=
Figure 65. Facebook’s ARPU and cost of revenue per user. Data from Facebook’s 10-K filings Facebook_14Facebook_16. code
caption=
Figure 66. Top 100 software companies ranked by total revenue (in millions of dollars) and ranked by Software-as-a-Service revenue. Data from PwC PwC_13PwC_14PwC_16. code
caption=
Figure 67. Various vendor’s retail price and upgrade prices for C and C++ compilers available under MS-DOS and Microsoft Windows between 1987 and 1998. Data kindly provided by Viard Viard_07. code

Software ecosystems

caption=
Figure 68. Decade in which newly designed US Air Force aircraft first flew. Data from Echbeth el at Eckbreth_11. code
caption=
Figure 69. Number of process model change requests made in three years of a banking Customer registration project. Data kindly supplied by Branco Branco_12. code
caption=
Figure 70. Man hours required to build a particular kind of ship, at the Delta Shipbuilding yard, delivered on a given date. Data from Thompson Thompson_07. code
caption=
Figure 71. Dependencies between the Java packages in various versions of ANTLR. Data from Al-Mutawa Al-Mutawa_13. code
caption=
Figure 72. Number of pdf files created using a given version of the portable document format appearing on sites having a .uk web address between 1996 and 2010. Data from Jackson Jackson_12. code
caption=
Figure 73. Percentage share of total Android market at days since launch for various versions of Android. Data from Bidouille Bidouille_15. code
caption=
Figure 74. Introductory price and benchmark performance of various Intel processors over 2003-2013. Data from Sun Sun_14. code
caption=
Figure 75. Number of transistors, frequency and SPEC performance of cpus when first launched. Data from Danowitz et al Danowitz_12. code
caption=
Figure 76. Shipments, per year, of various computers between 1975 and 2012. Data from Reimer Reimer_12. code
caption=
Figure 77. Number of optional features selected by a given number of flags. Data kindly provided by Berger Berger_12. code
caption=
Figure 78. Percentage of code ported from NetBSD to various versions of OpenBSD, broken down by version of NetBSD in which it first occurred (denoted by incrementally changing color). Data kindly provided by Ray Ray_13.
caption=
Figure 79. Percentage of source in 130 releases of Linux that originates in an earlier release. Data extracted from png file kindly supplied by Matsushita Livieri_07. code
caption=
Figure 80. Monthly unit sales (in millions) of microprocessors having a given bus width. Data kindly supplied by Turley Turley_02. code
caption=
Figure 81. Number of projects making use of a given number of different languages in a sample of 100,000 GitHub project. Data kindly supplied by Bissyande Bissyande_13. code
caption=
Figure 82. CPU product lifetime… Data from … code
caption=
Figure 83. Number of software systems surviving to a given number of years and an exponential fit. Data from Tamai Tamai_92. code
caption=
Figure 84. Ratio of development costs to five year maintenance costs for 158 IBM software systems sorted by size; curve is a beta distribution fitted to the data (in red). Data from Dunn Dunn_11. code
caption=
Figure 85. Survival curves of clones in the Linux high/medium/low level SCSI subsystems. Data from Wang Wang_12. code
caption=
Figure 86. Survival curve for packages included in the standard Debian distribution. Data from Caneill et al Caneill_14. code
caption=
Figure 87. Density plot of time interval, in hours, between each modification of a function in Evolution. Data from Robles et al Robles_12a. code
caption=
Figure 88. Number of functions (in Evolution) modified a given number of times broken down by number of authors. Data from Robles et al Robles_12a. code
caption=
Figure 89. Number of functions (in Evolution; the point at zero are incorrect counts) modified a given number of times (upper) or modified by a given number of different people (lower); red line is a straight line fit, green line a quadratic fit. Data from Robles et al Robles_12a. code
caption=
Figure 90. Number of identifiers renamed in the source of Eclipse-JDT, in a given month; date of version release marked. Data from Eshkevari et al Eshkevari_11. code
caption=
Figure 91. Survival curve for table rows in Wikimedia database. Data from Curino et al Curino_08. code
caption=
Figure 92. Changes in the number of tables (66 modifications) and total number of columns (150 modifications) in the Mediawiki database schema over elapsed time and change version. Data from Curino et al Curino_08. code
caption=
Figure 93. Percentage of patches submitted to WebKit (34,535 in total) transitioning between various stages of code review. Data from Baysal et al Baysal_13. code

Software development projects

caption=
Figure 94. Percentage profit/loss on fixed-price software development contracts… Data extracted from … code
caption=
Figure 95. Commits within a particular hour and day of week for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code
caption=
Figure 96. Cone of uncertainty in estimated cost with constant accuracy and costs per time interval (top left), with 1% improvement in accuracy in each time interval (bottom left/right), and with three different spends per time interval, c(0.5*(0:30), 15+1.5*(1:30), 60+1:40), (right top/bottom). code
caption=
Figure 97. Estimated effort against actual effort (in hours). Data from Jørgensen Jorgensen_04b. code
caption=
Figure 98. Quoted bid price and estimated effort from 14 companies… Data from Anda et al Anda_09. code
caption=
Figure 99. Percentage difference in two estimates for the same six projects made by seven developers… Data from Grimstad et al Grimstad_07. code
caption=
Figure 100. Actual project duration against number of schedule estimates made for it. Data from Little Little_06. code
caption=
Figure 101. Distribution of effort (person hours) during the development of four engine control systems projects, plus non-project work and holidays, at Rolls-Royce. Data extracted from Powell Powell_01. code
caption=
Figure 102. Phase during which work on a given phase of development was actually performed. Data from Zelkowitz Zelkowitz_88. code
caption=
Figure 103. Average value assigned to requirements (red) and one standard deviation bands (blue) based on omitting one stakeholder’s value list. Data from Regnell et al Regnell_01. code
caption=
Figure 104. Number of requirements added/deleted/modified in 22 releases of a product containing eight features (upper) and total number of requirements against requirements changed for those eight features (lower). Data extracted from Felici Felici_04. code
caption=
Figure 105. Pagerank of the stakeholder nodes in the network created from the Open (green) and Closed (blue) stakeholder responses (values for each have been sorted). Data from Lim Lim_10. code
caption=
Figure 106. Average number of feature implementations started (blue) and their average duration (red); a 30 day rolling mean has been applied to both. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 107. Number of features whose implementation took a given number of elapsed workdays. Top first 650 days, bottom after 650 days. Green line is the fitted negative binomial distribution. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 108. Number of feature developments started on a given work day (red bug fixes, blue non-bug work, black ratio of two values; 20 day rolling mean bottom left, 50 day top right, 120 day bottom right). Data kindly supplied by 7Digital 7Digital_12. code

Reliability

caption=
Figure 109. Input case on which a failure occurred, for a total of 500,000 inputs. Data from Dunham et al Dunham_86. code
caption=
Figure 110. Number of input cases processed before a given fault is experienced. Data from Dunham et al Dunham_86. code
caption=
Figure 111. Number of input cases processed before a given number of program failures is experienced; 25 replications. Data from Dunham et al Dunham_86. code
caption=
Figure 112. Time taken, in 10 distinct runs, to discover a thread safety violation in 22 different Java classes. Data kindly supplied by Pradel Pradel_12. code
caption=
Figure 113. Fraction of mutated programs, in various languages, that successfully compiled/executed/produced same output. Data from Spinellis et al Spinellis_12. code
caption=
Figure 114. Total number of failures per 30-day interval for each LANL system. Data from Los Alamos National Lab (LANL). code
caption=
Figure 115. Total number of failures for each node in the given LANL system. Data from Los Alamos National Lab (LANL). code
caption=
Figure 116. For systems 2 and 18, number of uptime intervals, binned into 10 hour intervals, red line is fitted negative binomial distribution. Data from Los Alamos National Lab (LANL). code
1014.png
Margin Fault slip throughs for a development project at Ericsson (left column list when fault could have been detected, bottom row when fault was detected). Data from Hribar Hribar_08. code
caption=
Figure 117. Various test suite coverage measures and mutants killed in 300 or so Java projects; black line is a loess fit. Data from Gopinath et al Gopinath_14. code
caption=
Figure 118. Statement (triangles) and branch (stars) coverage achieved using a program’s test suite… Data from Marinescu et al Marinescu_14. code
caption=
Figure 119. Amount of source (millions of lines) in each version broken down by the version in which it first appears. Data extracted Massacci et al Massacci_11. code
caption=
Figure 120. Market share of Firefox versions between official release and end-of-support. Data from w3schools.com. code
caption=
Figure 121. Number of people with Internet access per 100 head of population in the developed world and the whole world. Data from ITU ITU_12. code
caption=
Figure 122. Amount of end-user usage of code originally written for Firefox version 1.0 by various other versions. Data extracted from Massacci et al Massacci_11. code

Faults

caption=
Figure 123. Transition counts of the order in which five distinct faults were discovered in 50 runs of Program A2. Data from Nagel et al Nagel_82. code
caption=
Figure 124. Number of input cases that occurred before a particular fault was experienced by program A2; the list was sorted for each fault. Data from Nagel et al Nagel_82. code
caption=
Figure 125. Number of accesses to memory address blocks, per 100,000 instructions, executing gzip on two different inputs. Data from Brigham Young Brigham_Young via Feitelson. code
caption=
Figure 126. Number of reported incidents reported in each of 800 applications installed on over 120,000 desktop machines. Data from Lucente Lucente_15. code
caption=
Figure 127. Power analysis (50 and 10 runs at various p-values) of detecting a difference between two runs having a binomial distribution (runs needed to achieve power=0.8 at various p-values). code
caption=
Figure 128. Percentage of usability problems found by a given number of test subjects. Data extracted from Nielsen et al Nielsen_93. code
caption=
Figure 129. Survival rate of faults in Linux device drivers and other Linux subsystems… Data from Palix et al Palix_10b. code
caption=
Figure 130. Defects found against hours of testing… Data from Wood Wood_96. code
caption=
Figure 131. Percentage of reported problems having a given mean time to first problem occurrence (in months, summed over all installations of a product) for none products. Data from Adams Adams_84. code
caption=
Figure 132. Survival curve of the two most common warnings reported by Splint in Samba and Squid. Data from De Penta et al Di_penta_09. code
caption=
Figure 133. Reported faults against number of installations (upper) and age (lower)… Data from the "wheezy" version of Debian UDD_14. code
caption=
Figure 134. Number of various kinds of fault found during code review of nine implementations of the same specification and how located. Data extracted from Finifter Finifter_13b. code

Source code

caption=
Figure 135. Boxplot of ratings given to snippets 1 to 50 by second year students (colors used to help distinguish boxplots for each snippet). code
caption=
Figure 136. Aggregated ranking of snippets by subjects in years 1 and 2 (red and black) and years 2 and 4 (black and blue). Snippets have been sorted by year 2 ranking. code
caption=
Figure 137. Correlation, using Kendall’s tau, between each subject and their corresponding year aggregate ranking. code
caption=
Figure 138. Number of files and lines of code in 3,782 projects on Sourceforge. Data from Herraiz Herraiz_08. code
caption=
Figure 139. Total number of C functions measured, their total unused parameters and two fitted models. Data from Jones <book Jones_??>. code
caption=
Figure 140. Occurrences of sequences of java.lang.StringBuilder methods called on the same object in 11 GB of Java bytecode. Data from Mendez et al Mendez_13. code
caption=
Figure 141. For each class the percentage of method sequences containing a given number of calls (in 11 GB of Java bytecode). Data from Mendez et al Mendez_13. code
caption=
Figure 142. Number of commits of a given length, in lines added/deleted to fix various faults in Linux file systems. Data from Lu et al Lu_13. code
caption=
Figure 143. "Worth estimate" for identifier visibility ordering preferences declarations within a Java class. Data from Biegel et al Biegel_12. code
caption=
Figure 144. "Worth estimate" for the kind of method activity attribute. Data from Biegel et al Biegel_12. code
caption=
Figure 145. Number of method calls to Java APIs and non-APIs in 6,286 Open source projects. Data from Lämmel et al Lammel_11. code
caption=
Figure 146. Percentage occurrence of values appearing as the most significant digit of floating-point, integer and hexadecimal literals in C source code. Data from Jones Jones_05a. code
caption=
Figure 147. Lines of code, Halstead’s volume and cyclomatic complexity of Linux version 2.6.9. Data from Israel et al Israeli_10. code
caption=
Figure 148. Number of feature constants against LOC for 40 large C programs and two fitted regression lines (red and green; blue is one confidence interval). Data from Liebig et al Liebig_10. code

Stories told by data

caption=
Figure 149. Years of professional experience in a given language for experimental subjects. Data from Prechelt Prechelt_07. code
caption=
Figure 150. Plots of sample values having various visual patterns. code
caption=
Figure 151. Total number of lines of C code, in .c and .h files, having a given length, i.e., containing a given number of characters (upper) and tokens (lower). Data from Jones Jones_05a. code
caption=
Figure 152. Various measurements of work performed implementing the same functionality, number of lines of Haskell and C implementing functionality, CFP (COSMIC function points; based on user manual) and length of formal specification. Data kindly provided by Staples Staples_13. code
caption=
Figure 153. Effort, in hours (log scale), spent in various development phases of projects written in Ada (blue) and Fortran (red). Data from Waligora et al Waligora_95. code
caption=
Figure 154. Performance of experts (e) and novices (n) in a test driven development experiment. Data from Muller et al Muller_07. code
caption=
Figure 155. Correlations between pairs of attributes of 12,799 Github pull requests to the Homebrew repo, represented using colored ellipses. Data from Gousios et al Gousios_14. code
caption=
Figure 156. Correlations between pairs of attributes of 12,799 Github pull requests to the Homebrew repo, represented using pie charts and shaded boxes. Data from Gousios et al Gousios_14. code
caption=
Figure 157. Hierarchical cluster of correlation between pairs of attributes of 12,799 Github pull requests to the Homebrew repo. Data from Gousios et al Gousios_14. code
caption=
Figure 158. Effort invested in project definition (as percentage of original estimate) against cost overrun (as percentage of original estimate). Data extracted from Gruhl Gruhl_9x. code
caption=
Figure 159. Relative clock frequency of cpus when first launched (1970 == 1). Data from Danowitz et al Danowitz_12. code
caption=
Figure 160. Year and age at which survey respondents started contributing to FLOSS, i.e., made their first FLOSS contribution. Data from Robles et al Robles_14. code
caption=
Figure 161. SPECint results, summed over all distinct values (upper) and summed within equal width bins (lower). Data from SPEC website SPEC_14. code
caption=
Figure 162. Kernel density plot of the number of computers having the same SPECint result. Data from SPEC SPEC_14. code
caption=
Figure 163. Number of commits containing a given number of lines of code made when making various categories of changes to the Linux filesystem code (upper) and a density plot of the same data (lower). Data from Lu et al Lu_13. code
caption=
Figure 164. Three commonly used kernel density smoothing functions: gaussian, rectangular and triangular. code
caption=
Figure 165. Developer estimated effort against actual effort (in hours), for various maintenance tasks, e.g., adaptive, corrective and perfective; upper as-is, middle jittered values and lower size proportional to the log of the number measurements. Data from Hatton Hatton_07. code
caption=
Figure 166. Number of installations of Debian packages against the age of the package; middle plot was created by smoothScatter and lower plot by contour. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code
caption=
Figure 167. Number of lines added to glibc each week. Data from González-Barahona et al Gonzalez-Barahona_14. code
caption=
Figure 168. Boxplot of time between a bug in Eclipse being reported and the first response to the report; right plot is notched. Data from Breu et al Breu_10. code
caption=
Figure 169. Violin plots (left using vioplot, right using beanplot) of time between bug being reported in Eclipse and first response to the report. Data from Breu et al Breu_10. code
caption=
Figure 170. Time taken for developers to debug various programs using batch processing or online (i.e., time-sharing) systems. Data kindly provided by Prechelt Prechelt_99a. code
caption=
Figure 171. Pairs of languages used together in the same GitHub project with connecting line width, color and transparency related to number of occurrences. Data kindly supplied by Bissyande Bissyande_13. code
caption=
Figure 172. References from one document to another in the Microsoft Server Protocol specifications. Data extracted by the author from the 2009 document release WSPP_15. code
caption=
Figure 173. Alluvial plot of relative prioritization order of selection and application of Github pull requests. Data from Gousios et al Gousios_15a. code
caption=
Figure 174. Intel Sandy Bridge L3 cache bandwidth in GB/s at various clock frequencies and using combinations of cores (0-3 denotes cores zero-through-three, 0,2,4 denotes the three cores zero, two and four). Data from Schone et al Schone_12. code
caption=
Figure 175. Contour plot of the number of sessions executed on a computer having a given processor speed and memory capacity. Data kindly provided by Thereska Thereska_10. code
caption=
Figure 176. Root source of 1,257 faults and where fixes were applied for 21 large safety critical applications. Data from Hamill et al Hamill_14. code
caption=
Figure 177. Ternary plots drawn with two possible visual aids for estimating the position of a point (red plus at x=0.1, y=0.35, z=0.55); axis names appear on the vertex opposite the axis they denote. code
caption=
Figure 178. Earth relative positions of NASA’s Orbview-2 spacecraft when it experienced a single event upset (in blue) on 12 July 2000. Data kindly provided by LaBel Poivey_03. code
caption=
Figure 179. Estimated market share of Android devices by brand and product, based on downloads from 682,000 unique devices in 2015. Data from OpenSignal OpenSignal_15. code
caption=
Figure 180. Variables having a given number of read accesses, given 25, 50, 75 and 100 total accesses, calculated from running the weighted preferential attachment algorithm (red), the smoothed data (blue) and a fitted exponential (green). code
caption=
Figure 181. Throughput when running the SPEC SDM91 benchmark on a Sun SPARCcenter 2000 containing 8 CPUs, with the predictions from three fitted queuing models. Data from Gunther Gunther_05. code
caption=
Figure 182. Illustration of the difference in cognitive effort needed to locate points differing by shape or color. code
caption=
Figure 183. The three, seven and twelve color palettes returned by calls to the diverge_hcl, sequential_hcl, rainbow_hcl and rainbow functions. code
caption=
Figure 184. Percentage share of the Android market by successive Android releases between 2010 and 2015. Data from Bidouille Bidouille_15. code
caption=
Figure 185. Values plotted using a linear (upper) and logarithmic (lower) x-axis. Data from Dunham et al Dunham_86. code
caption=
Figure 186. Illustration of U-shape created when y-axis values are a ratio calculated from x-axis values. code
caption=
Figure 187. Mean time to fail for systems of various sizes (measured in lines of code); linear y-axis left, log y-axis right. Data extracted from Figure 8.3 of Putnam et al Putnam_92. code
caption=
Figure 188. Alternative representation of numeric values in Table. Data from Scott Scott_16. code
caption=
Figure 189. What’s up doc? Not the fitted model you were expecting. Equations from White White_12. code

Probability

caption=
Figure 190. Probability that three (red) or four (blue) consecutive false positive warnings occur in some total number of warnings (false positive rate appears on line). code
caption=
Figure 191. The relationship between words for tracts of trees in various languages. The interpretation given to words (boundary indicated by the zigzags) in one language may overlap that given in other languages. Adapted from DiMarco et al DiMarco_93.
caption=
Figure 192. Relationships between common discrete and continuous probability distributions.
caption=
Figure 193. Shapes of commonly encountered discrete probability distributions (upper to lower: Uniform, Geometric, Binomial and Poisson). code
caption=
Figure 194. Cumulative density plots of the discrete probability distributions in Figure. code
caption=
Figure 195. Commonly encountered continuous probability distributions (upper to lower: Uniform, Exponential, Normal, beta). code
caption=
Figure 196. Samples of randomly selected values drawn from the same normal distribution (left: 100 points in each sample, right 1,000 points in each sample). code
caption=
Figure 197. Reading rate for text printed using a serif (blue) and sans-serif (red) font, data has been normalised and displayed as a density. Data from Veytsman et al Veytsman_12. code
caption=
Figure 198. Probability, with 95% confidence, that shapiro.test correctly reports that samples drawn from various distributions are not drawn from a Normal distribution, and probability of an incorrect report when the sample is drawn from a Normal distribution. code
caption=
Figure 199. Number of conditionally compiled code sequences dependent on a given number of feature macros (red overwritten by blue: Linux, blue: FreeBSD). Data from Berger et al Berger_10. code
caption=
Figure 200. Percentage occurrence of statements for each of 100 or so C, C++ and Java programs, plotted as a density on the y-axis. Data from Zhu et al Zhu_15. code
caption=
Figure 201. A Cullen and Frey graph for the $3n+1$ program length data. Data kindly provided by van der Meulen van_der_Meulen_07. code
caption=
Figure 202. Number of 3n+1 programs containing a given number of lines and four distributions fitted to this data. Data kindly provided by van der Meulen van_der_Meulen_07. code
caption=
Figure 203. A zero-truncated Negative Binomial distribution fitted to the number of features whose implementation took a given number of elapsed workdays; first 650 days used. Data kindly provided by 7digital 7Digital_12. code
caption=
Figure 204. Percentage of function definitions in embedded applications, the SPECint95 benchmark, and the translated form of C source benchmark programs declared to have a given number of parameters. Data for embedded and SPECint95 kindly supplied by Engblom Engblom_99a, C book data from Jones Jones_05a. code
caption=
Figure 205. Density plot of MPI micro-benchmark runtime performance for calls to MPI_Scan with 10,000 Bytes (upper) and to MPI_Allreduce with 1,000 Bytes (lower). Data kindly supplied by Hunold Hunold_14. code
caption=
Figure 206. Mixture model fitted by the normalmixEM function to the performance data from calls to MPI_Allreduce. Data kindly supplied by Hunold Hunold_14. code
caption=
Figure 207. Density plot of accesses to one article on Slashdot, in minutes since its publication. The distinct Normal distributions (colored and fitted to the log of the data) contained in the mixture models fitted by the REBMIX (upper) and normalmixEM (lower) functions. Data kindly supplied by Kaltenbrunner Kaltenbrunner_07. code
caption=
Figure 208. Cumulative probability distribution of files size (red) and of number of bytes occupied in a file system (blue). Data from Irlam Irlam_93. code
caption=
Figure 209. Graph of available state transitions for Alaris volumetric infusion pump (the button presses that cause transitions between states are not shown). Data kindly supplied by Oladimeji Oladimeji_08. code
caption=
Figure 210. Discrete-time Markov chain for created/modified/deleted status of Linux kernel files at each major release from versions 2.6.0 to 2.6.39. Data from Tarasov Tarasov_12. code
caption=
Figure 211. Directed graph of emails between FreeBSD and OpenBSD developers, plus a few people involved in both discussions, with developers who sent/received less than four emails removed. Data from Canfora et al Canfora_11. code
caption=
Figure 212. Expected probability of a single instance (y-axis) against the probability of a measured struct type having grouped member types (x-axis); when both probabilities are the same points will be along the blue line. Data from Jones Jones_09b. code

Statistics for software engineering

caption=
Figure 213. Example of a sample drawn from a population. code
caption=
Figure 214. Date of introduction of a cpu against its commercial lifetime. Data from Culver Culver_10. code
caption=
Figure 215. A population of items having one of three colors and three strata sampled from it. code
caption=
Figure 216. Power consumed by three SERT benchmark programs at various levels of system load; crosses at 2% load intervals, lines based on 10% load intervals. Data kindly provided by Kistowski Kistowski_15. code
caption=
Figure 217. Distribution of 4,000 sample means for two sample sizes drawn from exponential (left), lognormal (center) and Pareto (right) distributions, vertical lines are 95% confidence bounds. The blue curve is the Normal distribution predicted by theory. code
caption=
Figure 218. Mean (red) and standard deviation (grey lines; they are not symmetrical because of the log scaling) of samples of 3 items drawn from a population of 1,000 items (blue line mean, green line standard deviation). Data kindly provided by Chen Chen_12. code
caption=
Figure 219. Density plot of mean of samples containing 3 or 12 items randomly selected from a data set of 1,000 items; process repeated 1,000 times for each sample size. Data kindly provided by Chen Chen_12. code
caption=
Figure 220. Number of commits to glibc for each day of the week, for the years from 1991 to 2012. Data from González-Barahona et al Gonzalez-Barahona_14. code
caption=
Figure 221. A Normal distribution with mean=4 and variance=8 and a Chi-squared distribution with four degrees of freedom having the same mean and variance (the vertical lines are at the distributions' median value). code
caption=
Figure 222. Density plot of execution time of 1,000 input data sets, with lines marking the mean, median and mode. Data kindly supplied by Chen Chen_12. code
caption=
Figure 223. Impact of serial correlation, AR(1) in this example, on the calculated mean (upper) and standard deviation (lower) of a sample (the legends specify the amount of serial correlation). code
caption=
Figure 224. Occurrence of sample median and mean values for 1,000 samples drawn from a binomial distribution. code
caption=
Figure 225. A contaminated normal, values drawn from two normal distributions with 10% of values drawn from a distribution having a standard deviation five times greater than the other. code
caption=
Figure 226. Regression model (red line; pvalue=0.02) fitted to the number of correct/false security code review reports made by 30 professionals; blue lines are 95% confidence intervals. Data from Edmundson et al Edmundson_13. code
caption=
Figure 227. Bootstrapped regression lines fitted to random samples of the number of correct/false security code review reports made by 30 professionals. Data from Edmundson et al Edmundson_13. code
caption=
Figure 228. Kernel density plot, with 95% confidence interval, of the number of computers having the same SPECint result. Data from SPEC SPEC_14. code
caption=
Figure 229. The four related quantities in the design of experiments. code
caption=
Figure 230. Examples of the impact of population prevalence, statistical power and p-value on number of false positives and false negatives. code
caption=
Figure 231. Visualization of Cohen’s $d$ for two normal distributions having different means and the same standard deviation (two left) and both different (right). code
caption=
Figure 232. The impact of differences in mean and standard deviation on the overlap between two populations ($\alpha$: probability of making a false positive error, and $\beta$: probability of making a false negative error). code
caption=
Figure 233. The power of a statistical test at detecting that a difference exists between the mean value of two sample drawn from two populations, both having a Normal distribution. code

Regression modeling

caption=
Figure 234. Relationship between data characteristics (edge labels) and applicable techniques (node labels) for building regression models.
caption=
Figure 235. Total lines of source code in FreeBSD by days elapsed since the project started (in 1993). Data from Herraiz Herraiz_08. code
caption=
Figure 236. Estimated cost and duration of 73 large Dutch federal IT projects, along with fitted model and 95% confidence intervals. Data from Kampstra et al Kampstra_09. code
caption=
Figure 237. Number of updates and fixes in each Linux release between version 2.6.11 and 3.2. Data from Corbet et al Corbet_12. code
caption=
Figure 238. The number of commits made and the number of contributing developers for Linux versions 2.6.0 to 3.12. The green line in the right plot is the regression model fitted by switching the x/y values. Data from Kroah-Hartman Kroah-Hartman_14. code
caption=
Figure 239. Effort/Size of various projects and regression lines fitted using Effort as the response variable (red, with green 95% confidence intervals) and Size as the response variable (blue). Data from Jørgensen et al <book Jorgensen_0?>. code
caption=
Figure 240. Lines of code in every initial release (i.e., excluding bug-fix versions of a release) of the Linux kernel since version 1.0, along with fitted straight line (upper) and quadratic (lower) regression models. Data from Israeli et al Israeli_10. code
caption=
Figure 241. Actual (left of vertical line) and predicted (right of vertical line) total lines of code in Linux at a given number of days since the release of version 1.0, derived from a regression model built from fitting a cubic polynomial to the data (dashed lines are 95% confidence bounds). Data from Israeli et al Israeli_10. code
caption=
Figure 242. Number of classes in the Groovy compiler at each release, in days since version 1.0. Data From Vasa Vasa_10. code
caption=
Figure 243. For each distinct language, the number of lines committed on Github and the number of questions tagged with that language. Data from Kunst Kunst_13. code
caption=
Figure 244. Percentage of vulnerabilities detected by developers working a given number of years in security. Data extracted from Edmundson et al Edmundson_13. code
caption=
Figure 245. Hours to develop software for 29 embedded consumer products and the amount of code they contain, with fitted regression model and loess fit (yellow). Data from Fenton el al Fenton_08. code
caption=
Figure 246. Points remaining after removal of overly influential observations, repeatedly applying Cook’s distance and Studentized residuals. Data from Fenton el al Fenton_08. code
caption=
Figure 247. Points remaining after removal of overly influential observations, also taking into account the Bonferroni p-value of the Studentized residuals; the line shows the fitted model and 95% confidence interval (loess fit in yellow). Data from Fenton el al Fenton_08. code
caption=
Figure 248. influenceIndexPlot for the model having the fitted line shown in Figure. Data from Fenton el al Fenton_08. code
caption=
Figure 249. Number of medical devices reported recalled by the US Food and Drug Administration, in two week bins. Upper: fitted straight line and confidence bounds, with loess fit (green); Lower: straight line (purple) fitted after two outliers replaced by mean and original fit (red). Data from Alemzadeh et al Alemzadeh_13. code
caption=
Figure 250. influenceIndexPlot of data from Alemzadeh et al Alemzadeh_13. code
caption=
Figure 251. Two fitted straight lines and confidence intervals, one up to the end of 2010 and one after 2010. Data from Alemzadeh et al Alemzadeh_13. code
caption=
Figure 252. Results from various studies of software requirements function points counted using COSMIC and FPA; lines are loess fits to studies based on industry and academic counters. Data from Amiri et al Amiri_11. code
caption=
Figure 253. Five different equations fitted to the Embedded subset of the COCOMO 81 data before influential observation removal (upper) and after influential observation removal (lower). Data from Boehm Boehm_81. code
caption=
Figure 254. Anscombe data sets with Pearson correlation coefficient, mean, standard deviation, and line fitted using linear regression. Data from Anscombe Anscombe_73. code
caption=
Figure 255. Residual of the straight line fit to the Linux growth data analysed in Figure (upper) and data+straight line fit (red) and loess fit (blue). Data from Israeli et al Israeli_10. code
caption=
Figure 256. Array element assignment benchmark compiled with gcc using the O0 (upper) and O3 (lower) options (measurements were grouped into runs of 2,000 executions). Data from Flater et al Flater_13. code
caption=
Figure 257. Number of installations of Debian packages against the age of the package, plus fitted model and loess fit. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code
caption=
Figure 258. Quadratic relationship with various amounts of added noise fitted using a quadratic and exponential model. code
caption=
Figure 259. Author workload against number of activity types per author (upper) and ratio test (lower). Data from Vasilescu et al Vasilescu_12. code
caption=
Figure 260. Change points detected by cpt.mean, upper using method="AMOC" and lower using method="PELT". Data from Alemzadeh et al Alemzadeh_13. code
caption=
Figure 261. Number of flags (y-axis jittered) used to control the selection of optional features in system containing a total number of features, loess curve (red), regression line (green). Data from Berger et al Berger_12. code
caption=
Figure 262. Monthly unit sales (in thousands) of 4-bit microprocessors. Data kindly supplied by Turley Turley_02. code
caption=
Figure 263. Fitted regression line to points (in red) and 3-D illustration of assumed Normal distribution of errors. code
caption=
Figure 264. Number of vulnerabilities detected by professional developers with web security review experience; upper: technically correct plot of model fitted using a Poisson distribution, lower: easier to interpret curve representation of fitted regression models assume error has a Poisson distribution (continuous lines) or a Normal distribution (dashed lines). Data extracted from Edmundson Edmundson_13. code
caption=
Figure 265. Number of functions containing a given number of break statements and a fitted Negative Binomial distribution. Data from Jones Jones_05a. code
caption=
Figure 266. Code review meeting duration for a given number of non-comment lines of code. Fitted regression model, assuming errors have a Gamma distribution (red, with confidence interval in blue) or a Normal distribution (green). Data from Porter et al Porter_98. code
caption=
Figure 267. Number of APIs used in Java programs containing a given number of lines and three fitted models. Data from Starek Starek_10. code
caption=
Figure 268. Yearly development cost and line of Fortran code delivered to the US Air Force between 1962 and 1984; with fitted regression models. Data extracted from NeSmith NeSmith_86. code
caption=
Figure 269. Maintenance task effort and lines of code added+updated, with fitted regression model (red) and SIMEX adjusted for 10% error (blue). Data from Jørgensen Jorgensen_95. code
caption=
Figure 270. Regression modeling 0/1 data with a straight line and a logistic equation. code
caption=
Figure 271. ROC curve for the data listed in Table. code
caption=
Figure 272. Percentage of mutants killed at various percentage of path coverage for 300 or so Java projects; fitted Beta (red) and glm (blue) regression models. Data from Gopinath et al Gopinath_14. code
caption=
Figure 273. SPECint 2006 performance results for processors running at various clock rates, memory chip frequencies and processor family. Data from SPEC SPEC_14. code
caption=
Figure 274. Component+residual plots for three explanatory variables in a fitted SPECint model. code
caption=
Figure 275. Individual contribution of each explanatory variable to the response variable in a quadratic model of SPECint performance. code
caption=
Figure 276. Estimated and actual effort broken down by communication frequency, along with individually fitted straight lines. Data from Moløkken-Østvold et al Molokken_Ostvold_07. code
caption=
Figure 277. Illustration of the shared and non-shared contributions made by two explanatory variables to the response variable Y. code
caption=
Figure 278. pairs plot of lines added/modified/removed, growth and number of files and total lines in versions 2.6.0 through 3.9 of the Linux kernel. Data from Kroah-Hartman Kroah-Hartman_14. code
caption=
Figure 279. Example plots of functions listed in Table. These equations can be inverted, so they start high and go down. code
caption=
Figure 280. Time to execute a computational biology program on systems containing processors with various L2 cache sizes. Data kindly provided by Hazelhurst Hazelhurst_10. code
caption=
Figure 281. A logistic equation fitted to the lines of code in every non-bugfix release of the Linux kernel since version 1.0. Data from Israel et al Israeli_10. code
caption=
Figure 282. Predictions by logistic equations fitted to Linux SLOC data, using subsets of data up to 2900, 3650, 4200 number of days and all days since the release of version 1.0. Data from Israel et al Israeli_10. code
caption=
Figure 283. Increase in areal density of hard disks entering production over time. Data from Grochowski et al Grochowski_12. code
caption=
Figure 284. Lines of code in the GNU C library against days since 1 January 1990. Data from González-Barahona Gonzalez-Barahona_14. code
caption=
Figure 285. Number of failing programs caused by unique faults in gcc (upper) and SpiderMonkey (lower). Fitted model in green, with two exponential components in red and blue. Data kindly provided by Chen Chen_13. code
caption=
Figure 286. Power consumption of six different Intel Core i5-540M processors running at various frequencies; colored lines denote fitted regression models for each processor. Data from Balaji et al Balaji_12. code
caption=
Figure 287. Example showing the three ways of structuring a mixed effects model, i.e., different intersections/same slope (upper), same intersection/different slopes (middle) and different intersections/slopes (lower). code
caption=
Figure 288. Confidence intervals, 95%, for within-subject intercept and slope (right plots) of mixed-effect models in the adjacent code. code
caption=
Figure 289. The three components of the hourly rate of commits, during a week, to the Linux kernel source tree; components extracted from the time series by stl. Data from Eyolfson et al Eyolfson_11. code
caption=
Figure 290. Autocorrelation of number of defects found on a given day, for development project C. Data kindly provided by Buettner Buettner_08. code
caption=
Figure 291. Autocorrelation of two AR models (upper plots) and two MA models (lower plots). code
caption=
Figure 292. Partial autocorrelation of same two AR models (upper plots) and two MA models (lower plots) shown in Figure. code
caption=
Figure 293. Autocorrelation of indentation of source code written in various languages. Data from Hindle et al Hindle_08. code
caption=
Figure 294. Number of features started for each day and fitted regression trend line (left) and number of features after subtracting the trend (right), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 295. Autocorrelation (left) and partial autocorrelation (right) of the number of features started on a given day (after differencing the log transformed data), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 296. Predicted daily difference in the number of new feature starts (red) and 95% confidence intervals (blue). Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 297. Time series whose values are uncorrelated (upper), but whose squared values are correlated (lower); see code for generation process. code
caption=
Figure 298. Cross correlation of feature release ‘size’ (upper non-bugfix releases, lower all releases) and date when bugs are prioritised. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 299. Estimated staff working on a project during every week. Data from Buettner Buettner_08. code
caption=
Figure 300. Market share of Firefox version 3.0 fitted using loess regression with various values of the span option. Data from W3Counter W3Counter_14. code
caption=
Figure 301. Cross-correlation of source lines added/deleted per week to the glibc library. Data from González-Barahona Gonzalez-Barahona_14. code
caption=
Figure 302. Visualization of alignment between weekly time series of lines code in NetBSD (blue) and FreeBSD (red). Data from Herraiz Herraiz_08 code
caption=
Figure 303. Effort distribution (person hours) over the eight main tasks of a development project at Rolls-Royce and a hierarchical clustering of each task effort time series based on pair-wise correlation and Euclidean distance metrics. Data extracted from Powell Powell_01. code
caption=
Figure 304. Two commonly used hazard functions; Weibull is monotonic (always increases, decreases or remains the same) and Lognormal which can increase and then decrease. code
caption=
Figure 305. Observation period with events inside and outside the study period. code
caption=
Figure 306. The Kaplan-Meier curve for survivability of new releases: (blue) ETPs using only official APIs, (blue) ETPs calling internal APIs (red); dotted lines are 95% confidence intervals. Data from Businge Businge_13. code
caption=
Figure 307. The Kaplan-Meier curve for survivability of ETPs ability to be built using SDK released in subsequent years: (blue) ETPs using only official APIs, (red) ETPs calling internal APIs; dotted lines are 95% confidence intervals, with plus signs, +, indicating censored data. Data from Businge Businge_13. code
caption=
Figure 308. Kaplan-Meier curves for time-to-fix…. Data from Arora et al Arora_10. code
caption=
Figure 309. Survival curve after adjustment for explanatory variables… code
caption=
Figure 310. Cumulative incidence curves for problems reported by the splint tool in Samba and Squid (time is measured in number of snapshot releases). Data from Di Penta et al Di_penta_09. code
caption=
Figure 311. Rose diagram of number of commits in each 3 hour period of a day for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code
caption=
Figure 312. The Cartwright (red; dcarthwrite), wrapped Cauchy (green; dwrappedcauchy) and wrapped von Mises (blue; dvonmises) circular probability distributions for various values of their parameters. code
caption=
Figure 313. Asymmetric extended wrapped forms of the Cardioid (upper), von Mises (middle) and Cauchy (lower) probability distributions for various values of their parameters. code
caption=
Figure 314. Number of commits (upper) and number of commits in which a fault was detected (lower) by hour of day of the commit, for Linux. Data from Eyolfson et al Eyolfson_14. code
caption=
Figure 315. Number of commits per hour for weekdays and fitted model (upper) and number of commits in which a fault was detected (lower), for Linux. Data from Eyolfson et al Eyolfson_14. code
caption=
Figure 316. Number of commits per hour for each weekday, fitted using $\cos(...\cos...)$ (upper) and $\cos(...\cos+\sin...)$ (lower), for Linux; in both cases the fitted fault model (red) has been rescaled to allow comparison. Data from Eyolfson et al Eyolfson_14. code
caption=
Figure 317. Application source lines against percentage of covered lines achieved by both Human & Dynodroid tests, by only by Dynodroid tests and only by Human tests. Data from Machiry et al Machiry_13. code
caption=
Figure 318. Percentage of source lines covered by both Human & Dynodroid tests, by only by Dynodroid tests and only by Human tests; fitted regression line and prediction points for various total source lines, red plus. Data from Machiry et al Machiry_13. code

Other techniques

caption=
Figure 319. Volume of unit sphere in 1 to 50 dimensions, e.g., sphere has volume $\frac43pi$ in three dimensions. code
caption=
Figure 320. Top levels of the decision tree built from the reopened fault data. Data from Shihab et al Shihab_10a. code
caption=
Figure 321. A Bertin plot for items included in the same data structure as ‘Antibiotics used’, for each subject, after reordering by seriate. Data from Jones Jones_09b. code
caption=
Figure 322. A visualization of the Robinson matrix based on number of times pairs of items co-occur in the same data structure (the closer to the diagonal the more often they occur together). Data from Jones Jones_09b. code

Experiments

caption=
Figure 323. Time taken, by the same person, to implement 12 algorithms from the Communications of the ACM, with four iteration of the implementation process. Data from Zislis Zislis_73. code
caption=
Figure 324. Time taken to transfer and multiply 2-dimensional matrices of various sizes on a GTX 480 GPU. Data kindly supplied by Gregg and Hazelwood Gregg_11. code
caption=
Figure 325. Relative performance (y-axis) of libraries optimized to run on various processors (x-axis). Data from Bird Bird_10. code
caption=
Figure 326. Number of integer constants having the lexical form of a decimal-constant (the literal 0 is also included in this set) and hexadecimal-constant that have a given value. Data from Jones Jones_05a. code
caption=
Figure 327. One and two-sided significance testing. code
caption=
Figure 328. A cube plot of three configuration factors and corresponding benchmark results (blue) from Memory table experiment. Data from Citron et al Citron_03b. code
caption=
Figure 329. Design plot showing the impact of each configuration factor on the performance of Memo table on benchmark performance. Data from Citron et al Citron_03b. code
caption=
Figure 330. Interaction plot showing how cint changes with size for given values of associativity and mapping. Data from Citron et al Citron_03b. code
caption=
Figure 331. Number of Reflection benchmark results achieving a given score, reported for GTX 970 cards from three third-party manufacturers. Data extracted from UserBenchmark.com. code
caption=
Figure 332. Density plots of project bids submitted by companies before/after see a requirements document. Data from Jørgensen et al Jorgensen_04c. code
caption=
Figure 333. Density plot of task implementation estimates: with no instructions (red) and with instruction on what to do (blue). Data from Jørgensen el al Jorgensen_04. code
caption=
Figure 334. Examples of correlation between samples of two value pairs, plotted on x and y axis. code
caption=
Figure 335. Number of software faults having a given consequence, based on an analysis of faults in Cassandra. Data from Gunawi et al Gunawi_14. code
caption=
Figure 336. Performance and rental cost of early computers, with straight line fits for a few years. Data from Knight Knight_66. code
caption=
Figure 337. Feature size, in Silicon atoms, of microprocessors. Data from Danowitz et al Danowitz_12. code
caption=
Figure 338. Maximum number of records sorted in 1 minute and using 1 penny’s worth of system time (upper). SPEC2006 integer benchmark results (lower). Data from Gray et al Gray_14 and SPEC SPEC_14. code
caption=
Figure 339. Total system power consumed when sorting 10, 20, 30, 40, 50 million integers (colored pluses) using three techniques running on the same processor at different clock frequencies. Data from Götz et al Gotz_14. code
caption=
Figure 340. Power consumed by 10 Amtel SAM3U microcontrollers at various temperatures when sleeping or running. Data from Wanner et al Wanner_10. code
caption=
Figure 341. Power spectrum of electrical power consumed by an app running on a ???. Data from Saborido et al Saborido_15. code
caption=
Figure 342. Read bandwidth at various offsets for new disks sold in 2002 (upper) and 2006 (lower). Data kindly provided by Krevat Krevat_13. code
caption=
Figure 343. Average power consumed by one server’s CPU (four Pentium 4 Xeons; red) and memory (8 GB PC133 DIMMs; blue) running the SPEC CPU2006 benchmark (upper) and breakdown by system component when executing various programs. Data from Bircher Bircher_10. code
caption=
Figure 344. FFT benchmark executed 2,048 times followed by system reboot, repeated 10 times. Data kindly provided by from Kalibera_05. code
caption=
Figure 345. Percentage change, relative to no environment variables, in perlbench performance as characters are added to the environment. Data extracted from Mytkowicz et al Mytkowicz_08. code
caption=
Figure 346. Changes in SPEC CPU2006 benchmark performance caused by cache and memory bus contention for one dual processor Intel Xeon E5345 system. Data kindly provided by Babka Babka_12. code
caption=
Figure 347. Execution time of 330.art_m, an OpenMP benchmark program, using different compilers, number of threads and setting of thread affinity. Data kindly provided by Mazouz Mazouz_13. code
caption=
Figure 348. Access times when walking through memory using three fixed stride patterns (i.e., 32, 64 and 128 bytes) on a quad-core Intel Xeon E5345; grey lines at one standard deviation. Data kindly provided by Babka Babka_09. code
caption=
Figure 349. Performance variation of programs from the Talos benchmark run on original OS and a stabilised OS. Data from Larres Larres_12. code
caption=
Figure 350. Operations per second of a file-sever mounted on one of ext2, ext3, rfs and xfs filesystems (same color for each filesystem) using various options. Data kindly supplied by Huang Zhou_12. code
caption=
Figure 351. Percentage change in SPEC number, relative to version 4.0.4, for 12 programs compiled using six different versions of gcc (compiling to 64-bits with the O3 option). Data from Makarow Makarow_14. code
caption=
Figure 352. Execution time of xy file compressor, compiled using gcc using various optimization options, running on various systems (lines are mean execution time when compiled using each option). Data kindly supplied by Petkovich de_Oliveira_13. code
caption=
Figure 353. Execution time of Perlbench, from SPEC benchmark, on six systems, when linked in three different orders and address randomization on/off. Data kindly supplied by Reidemeister de_Oliveira_13. code
caption=
Figure 354. Performance of PassMark memory benchmark on 783 Intel Core i7-3770K systems; lower plot created by trimming 10% of values from the ends of what appears in the upper plot. Data kindly supplied by David Wren PassMark_14. code
caption=
Figure 355. Ubench cpu performance on small (upper) and large (lower) EC2 instances, Europe in red and US in green. Data kindly provided by Dittrich Schad_10. code
caption=
Figure 356. Lines of code that 101 professional developers, with a given number of years experience, estimate they have written. Data from Jones Jones_06aJones_08aJones_09b. code

Overview of R

caption=
Figure 357. Plot produced by hello_world.R program. code
caption=
Figure 358. The unique bytes per window (256 bytes wide) of a pdf file. code

Data preparation

caption=
Figure 359. Screen height and width reported by 682,000 unique devices that downloaded an App from OpenSignal in 2015 (upper), reported measurements ordered so height always the larger value (lower). Data from OpenSignal OpenSignal_15. code
caption=
Figure 360. Number of reported vulnerabilities, per day, in the US National Vulnerability Database for 2003. Data from the National Vulnerability Database NVD_14. code
caption=
Figure 361. Percentage occurrence of the first digit of hexadecimal numbers in C source and estimated from Google book data. Data from Jones Jones_05a and Michel et al Michel_11. code
caption=
Figure 362. Number of processes executing for a given amount of time, with measurements expressed using two and six significant digits. Data from Feitelson Feitelson_14. code