Read me 1st

All of the figures from the book, plus links to the code on Github and a Google search for the paper from which the data was obtained.

Introduction

caption=
Figure 1. Total cost of one million computing operations over time. Data from Nordhaus Nordhaus_01. code
caption=
Figure 2. Storage cost, in US dollars per Mbyte, of mass market technologies over time. Data from McCallum McCallum_16. code
caption=
Figure 3. Growth of transport and product distribution infrastructure in the USA (underlying data is measured in miles). Data from Grübler et al Grubler_91. code
caption=
Figure 4. Market capitalization of IBM, Microsoft and Apple (top) and expressed as a percentage of the top 100 listed US tech companies (bottom). Data extracted from the Economist website Economist_15. code
caption=
Figure 5. Total annual sales of computer families over the last 60 years. Data from Gordon Gordon_87 (mainframes and minicomputers), Reimer Reimer_12 (PCs) and Gartner Gartner_17 (smartphones). code
caption=
Figure 6. Total investment in tangible and intangible assets by UK companies, based on their audited accounts. Data from Goodridge et al Goodridge_14. code
caption=
Figure 7. Billions of dollars of worldwide semiconductor sales per month. Data from World Semiconductor Trade Statistics WSTs_16. code
caption=
Figure 8. Changing habits in men’s facial hair. Data from Robinson Robinson_76. code
caption=
Figure 9. Number of papers, in each year between 1987 and 2003, associated with a particular IT topic. The E-commerce paper count peaks at 1,775 in 2000 and in 2003 is still off the scale compared to other topics. Data kindly provided by Wang Wang_10. code
caption=
Figure 10. Normal distribution with total percentage of values enclosed within a given number of standard deviations. code

Human cognitive characteristics

caption=
Figure 11. Unless cognition and the environment in which it operates closely mesh together, no problems are solved; the blades of a pair of scissors need to closely mesh for cutting to occur. code
caption=
Figure 12. The assumption of light shining from above creates the appearance of bumps and pits. Could be more convincing hemispheres with light shining from above and below… code
caption=
Figure 13. Probability that rat N1 will press a lever a given number of times before pressing a second lever to obtain food, when the target count is 4, 8, 12 and 16. Data extracted from Mechner Mechner_58. code
caption=
Figure 14. Boy/girl (aged 11-12 years) verbal reasoning, quantitative reasoning, non-verbal reasoning and mean CAT score over the three tests; each stanine band is 0.5 standard deviations wide. Data from Strand et al Strand_06. code
caption=
Figure 15. Rotate text in the real world, by tilting the head, or in the mind? code
caption=
Figure 16. Two objects paired with another object that may be a rotated version. Based on Shepard et al Shepard_71. code
caption=
Figure 17. Error rate, with standard error, for the left/right hand in a study of the SNARC effect. Data from Nuerk et al Nuerk_05. code
caption=
Figure 18. Structure of mammalian long-term memory subsystems; brain areas in red. Based on Squire et al Squire_15.
caption=
Figure 19. Percentage correct answers to questions about binary operator precedence against occurrence in source code. Data from Jones Jones_06a. code
caption=
Figure 20. Response time (left axis) and error percentage (right axis) on reasoning task with given number of digits held in memory. Data extracted from Baddeley Baddeley_09. code
caption=
Figure 21. Major components of working memory: working memory in yellow, long-term memory in orange. Based on Baddeley Baddeley_12. code
caption=
Figure 22. Yes/no response time (in milliseconds) as a function of the number of digits held in memory. Data extracted from Sternberg Sternberg_69. code
caption=
Figure 23. Parse tree of a sentence with no embedding, upper "S 1", and a sentence with four degrees of embedding, lower "S 4". Based on Miller et al Miller_64. code
caption=
Figure 24. Sequencing errors (as percentage) after interruptions of various length (red), including 95% confidence intervals, normal sequence error rate in green; lines are fitted model predictions. Data from Altmann et al Altmann_17. code
caption=
Figure 25. Semantic memory representation of alphabetic letters (the numbers listed along the top are place markers and are not stored in subject memory). Readers may recognize the structure of a nursery rhyme in the letter sequences. Derived from Klahr Klahr_83. code
caption=
Figure 26. Probability of correct recall of words by serial presentation order (each word visible for 1 or 2 seconds, last digit in legend). Data extracted from Murdoch Murdoch_62. code
caption=
Figure 27. Time taken to solve the same jig-saw puzzle 35 times, followed by a two-week interval and then another 35 times, with power law and exponential fits. Data extracted from Alteneder Alteneder_35. code
caption=
Figure 28. Completion times of eight solo (upper) and eight pairs (lower) for each implementation round, along with fitted equation…. Data kindly provided by Lui Lui_06. code
caption=
Figure 29. Subjects belief response curves for positive weak&endash; strong, negative weak&endash; strong, and positive&endash; negative evidence. Based on Hogarth et al Hogarth_92. code
caption=
Figure 30. Country boundaries distort judgement of relative city locations. Based on Stevens et al Stevens_78.
caption=
Figure 31. Orthogonal representation of shape, color and size stimuli. Based on Shepard Shepard_61.
caption=
Figure 32. The six unique configurations of selecting four times from eight possibilities, i.e., it is not possible to rotate one configuration into another within these six configurations. Based on Shepard Shepard_61.
caption=
Figure 33. Percentage of correct answers given by one subject, against boolean-complexity of category, colored by number of positive cases needed to define the category. Data kindly provided by Feldman Feldman_00. code
caption=
Figure 34. The Berlin and Kay Berlin_69 language color hierarchy. The presence of any color term in a language implies the existence, in that language, of all terms below it. Papuan Dani has two terms (black and white), while Russian has eleven (Russian may also be an exception in that it has two terms for blue.) code
caption=
Figure 35. Cup- and bowl-like objects of various widths (ratios 1.2, 1.5, 1.9, and 2.5) and heights (ratios 1.2, 1.5, 1.9, and 2.4). The percentage of subjects who selected the term cup or bowl to describe the object they were shown (the paper did not explain why the figures do not sum to 100%). Based on Labov Labov_73. code
caption=
Figure 36. A commercial event involving a buyer, seller, money, and goods; as seen from the buy, sell, pay, or charge perspective. Based on Fillmore Fillmore_77. code
caption=
Figure 37. Lines of code correctly recalled after a given number of 2 minute memorization sessions; upper plot actual program, lower plot line order scrambled. Data extracted from McKeithen et al McKeithen_81. code
caption=
Figure 38. Examples of features that may be preattentively processed (parallel lines and the junction of two lines are the odd ones out). Based on Ware Ware_00.
caption=
Figure 39. Continuity&emdash; upper left plot is perceived as two curved lines; Closure&emdash; when the two perceived lines are joined at their end (upper right), the perception changes to one of two cone-shaped objects; Symmetry and parallelism&emdash; where the direction taken by one line follows the same pattern of behavior as another line; Proximity&emdash; the horizontal distance between the dots in the lower left plot is less than the vertical distance, causing them to be perceptually grouped into lines (the relative distances are reversed in the right plot); Similarity&emdash; a variety of dimensions along which visual items can differ sufficiently to cause them to be perceived as being distinct; rotating two line segments by 180°ree; does not create as big a perceived difference as rotating them by 45°ree;; TODO look good. code
caption=
Figure 40. Perceived grouping of items on a line may be by shape, color or proximity. Based on kubovy et al kubovy_08. code
caption=
Figure 41. Examples of unique items among visually similar items. Those at the top include an item that has a distinguishing feature (a vertical line or a gap); those underneath them include an item that is missing this distinguishing feature. Based on displays used by Treisman et al Treisman_85. code
caption=
Figure 42. The foveal, parafoveal and peripheral vision regions when three characters visually subtend 3°ree;. Based on Schotter et al Schotter_12. code
caption=
Figure 43. Local context can change the interpretation given to the surrounding symbols. code
caption=
Figure 44. Example object layout and the corresponding ordered tree produced from the answers given by one subject. Data extracted from McNamara et al McNamara_89. code
caption=
Figure 45. Heat map of one subject’s cumulative fixations (black dots) on a screen image. Data kindly provided by Ali Ali_12. code
caption=
Figure 46. The four cards used in the Wason selection task. Based on Wason Wason_68. code
caption=
Figure 47. Probability a subject will successfully distinguish a difference between the number of dots displayed and a specified target number (x-axis is the difference between these two values). Data extracted from van Oeffelen et al van_Oeffelen_82. code
caption=
Figure 48. Line locations chosen for the numeric values seen by each of four subjects; color of fitted loess line changes at one million boundary. Data kindly provided by Landy Landy_17. code
caption=
Figure 49. Number of errors, in 132 simple multiplication trials (e.g., $3\times7$), upper plot shows operand values (a loess fit in yellow) and lower plot result value (points where both operands have the same value are in blue). Data from Campbell Campbell_97. code
caption=
Figure 50. One subject’s response time over successive blocks of command line trials and fitted loess (in green). Data kindly provided by Remington Remington_16. code
caption=
Figure 51. Risk neutral (green, $u(w)=w$), risk loving (red, quadratic) and risk averse (blue, square-root) utility functions. code
caption=
Figure 52. Subjects' estimate of their ability (x-axis) to correctly answer a question and actual performance in answering on the left scale. The responses of a person with perfect self-knowledge is given by the solid line. Data extracted from Lichtenstein et al Lichtenstein_77. code
caption=
Figure 53. Each row shows a scaled version of the three stripes, along with actual lengths in inches, from which subjects were asked to select the longest. Based on Asch Asch_56. code

Cognitive capitalism

caption=
Figure 54. Company revenue (in millions of dollars) against total software development costs. Data from Mulford et al Mulford_16. code
caption=
Figure 55. Average Return On Invested Capital of various U.S. industries between 1992-2006. Data from Porter Porter_08. code
caption=
Figure 56. Ratio of actual to estimated hours of effort to enhance an existing product, for 25 versions of one application. Data from Huijgens et al Huijgens_16. code
caption=
Figure 57. Accounting practice for breaking down income from sales… code
caption=
Figure 58. Average effort (in days) used to fix a defect detected in a given phase (x-axis) that had been introduced in an earlier phrase (colored lines), introduced in an earlier phase (total of 38,120 defects in projects at Hughes Aircraft). Data extracted from Willis et al Willis_98. code
caption=
Figure 59. Months of developer effort needed to produce systems containing a given number of lines of code… Data from Gayek et al Gayek_04. code
caption=
Figure 60. Introductory price and benchmark performance of various Intel processors between 2003-2013. Data from Sun Sun_14. code
caption=
Figure 61. Example supply and demand curves. code
caption=
Figure 62. Rates at which product sales are made on Gumroad at various prices; lines join prices that differ in 1¢s;, e.g., $1.99 and $2. Data from Nichols Nichols_13. code
caption=
Figure 63. Growth of Github users during its first 58 months. Data from Irving Irving_16. code
caption=
Figure 64. Sales of game software (solid lines) for the corresponding three major seventh generation hardware consoles (dotted lines). Data from VGChartz VGChartz_17. code
caption=
Figure 65. Percentage of sales closed in a given week of a quarter, with average discount given. Data from Larkin Larkin_13. code
caption=
Figure 66. Facebook’s ARPU and cost of revenue per user. Data from Facebook’s 10-K filings Facebook_14Facebook_16. code
caption=
Figure 67. Top 100 software companies ranked by total revenue (in millions of dollars) and ranked by Software-as-a-Service revenue. Data from PwC PwC_13PwC_14PwC_16. code
caption=
Figure 68. Various vendor’s retail price and upgrade prices for C and C++ compilers available under MS-DOS and Microsoft Windows between 1987 and 1998. Data kindly provided by Viard Viard_07. code
caption=
Figure 69. Difference between… Data from Bayus et al Bayus_01. code

Ecosystems

caption=
Figure 70. Total gigabytes of DRAM shipped world-wide in given year, along with shipments by device capacity (in bits). Data from Victor et al Victor_02. code
caption=
Figure 71. Annual percentage of shipped mobile phone operating systems. Data from Reimer Reimer_12 (before 2007) and Gartner Gartner_17 (after 2006). code
caption=
Figure 72. Maximum speed achieved by vehicles over the surface of the Earth and in the air, over time. Data from Lienhard Lienhard_06. code
caption=
Figure 73. Number of transistors, frequency and SPEC performance of cpus when first launched. Data from Danowitz et al Danowitz_12. code
caption=
Figure 74. Number of process model change requests made in three years of a banking Customer registration project. Data kindly supplied by Branco Branco_12. code
caption=
Figure 75. Total instructions in the software shipped with various models of IBM computer, plus Datatron from Burroughs. Data extracted from Naur et al Naur_69. code
caption=
Figure 76. Total value of custom and packaged software (hardware vendor+third-party) sales in the US. Data from Phister Phister_79. code
caption=
Figure 77. Estimated number of comments written in German, in the LibreOffice source code. Data from Meeks Meeks_17. code
caption=
Figure 78. Percentage of function definitions in embedded applications, the SPECint95 benchmark???, and the translated form of C source benchmark programs declared to have a given number of parameters. Data for embedded and SPECint95 kindly supplied by Engblom Engblom_99a, C book data from Jones Jones_05a. code
caption=
Figure 79. Hours required to build a car radio after the production of a given number of radios, with break periods (shown in days above x-axis); lines are models fitted to each production period. Data extracted from Nembhard et al Nembhard_01. code
caption=
Figure 80. Man-hours required to build a particular kind of ship, at the Delta Shipbuilding yard, delivered on a given date (x-axis). Data from Thompson Thompson_07. code
caption=
Figure 81. Total computer systems purchased and rented by the US Federal Government in the respective fiscal years ending June 30. Data from US Government General Accounting Office Staats_71. code
caption=
Figure 82. Yearly development cost and lines of code delivered to the US Air Force between 1960 and 1986. Data extracted from NeSmith NeSmith_86. code
caption=
Figure 83. Total sales of various kinds of processors. Data from Hilbert et al Hilbert_11. code
caption=
Figure 84. Monthly unit sales (in millions) of microprocessors having a given bus width. Data kindly supplied by Turley Turley_02. code
caption=
Figure 85. TSMC revenue from wafer production, as a percentage of total revenue, at various line widths. Data from TSMC TSMC_17. code
caption=
Figure 86. Number of new UK companies registered each month, whose SIC description includes the word software or computer (case not significant). Data extracted from OpenCorporates OpenCorporates_15. code
caption=
Figure 87. Connections between companies in a Dutch software business network. Data kindly provided by Crooymans Crooymans_15. code
caption=
Figure 88. Reported worldwide software industry Mergers and Acquisitions (M&A). Data from Solganick Solganick_16. code
caption=
Figure 89. Estimated percentage of commercial membership of IETF committee that took a given number of days to agree to publish a RFC. Data from Simcoe Simcoe_13. code
caption=
Figure 90. Percentage of employment by US industry sector 1850-2009. Data kindly provided by Kossik Kossik_11. code
caption=
Figure 91. Total value of bug bounties earned by researchers between 2014-2016. Data from Maillart et al Maillart_16. code
caption=
Figure 92. Decade in which newly designed US Air Force aircraft first flew, with colors indicating current operational status. Data from Echbeth el at Eckbreth_11. code
caption=
Figure 93. Daily minutes spent using an App, from Apple’s AppStore, … Data extracted from Ansar <book Ansar_1?>. code
caption=
Figure 94. Number of optional features selected by a given number of flags. Data kindly provided by Berger Berger_12. code
caption=
Figure 95. Cumulative percentage of configuration options impacting a given number of source files in the Linux kernel. Data kindly provided by Ziegler Ziegler_16. code
caption=
Figure 96. Ratio of development costs to total five-year maintenance costs for 158 IBM software systems sorted by size; curve is a beta distribution fitted to the data (in red). Data from Dunn Dunn_11. code
caption=
Figure 97. Number of software systems surviving to a given number of years and exponential equation fits. Data from Tamai Tamai_92. code
caption=
Figure 98. Percentage of patches submitted to WebKit (34,535 in total) transitioning between various stages of code review. Data from Baysal et al Baysal_13. code
caption=
Figure 99. Number of forked projects identified in Wikipedia during August 2011. Data from Robles et al Robles_12b.
caption=
Figure 100. Percentage of code ported from NetBSD to various versions of OpenBSD, broken down by version of NetBSD in which it first occurred (denoted by incrementally changing color). Data kindly provided by Ray Ray_13.
caption=
Figure 101. Survival curve for Linux distributions derived from various widely-used distributions. Data from Lundqvist et al Lundqvist_12. code
caption=
Figure 102. Survival curve for packages included in the standard Debian distribution. Data from Caneill et al Caneill_14. code
caption=
Figure 103. Number of pdf files created using a given version of the portable document format appearing on sites having a .uk web address between 1996 and 2010. Data from Jackson Jackson_12. code
caption=
Figure 104. Percentage share of total Android market at days since launch for various versions of Android. Data from Villard Villard_15. code
caption=
Figure 105. Words in Intel x86 architecture manuals and code-points in Unicode Standard over time. Data kindly provided by Baumann Baumann_16. code
caption=
Figure 106. Number of gcc compiler flags and options over time, and fitted regression models. Data from Fursin et al Fursin_14. code
caption=
Figure 107. Number of monthly developer job related tweets specifying a given language. Data kindly provided by Destefanis Destefanis_14. code
caption=
Figure 108. Number of projects making use of a given number of different languages in a sample of 100,000 GitHub project. Data kindly supplied by Bissyande Bissyande_13. code
caption=
Figure 109. Ranked order of number of Android/Ubuntu (1.1 million apps)/(71,199 packages) linking to each supported POSIX function. Data from Atlidakis et al Atlidakis_16. code
caption=
Figure 110. Survival curves for Debian package lifetime and for a package to contain its first dependency conflict. Data from Drobisz et al Drobisz_15. code
caption=
Figure 111. Dependencies between the Java packages in various versions of ANTLR. Data from Al-Mutawa Al-Mutawa_13. code
caption=
Figure 112. Fraction of source in 130 releases of Linux (x-axis) that originates in an earlier release (y-axis). Data extracted from png file kindly supplied by Matsushita Livieri_07. code
caption=
Figure 113. Number of functions (in Evolution; the point at zero are incorrect counts) modified a given number of times (upper) or modified by a given number of different people (lower); red line is a straight line fit, green line a quadratic fit. Data from Robles et al Robles_12a. code
caption=
Figure 114. Number of functions (in Evolution) modified a given number of times broken down by number of authors. Data from Robles et al Robles_12a. code
caption=
Figure 115. Density plot of time interval, in hours, between each modification of a function in Evolution. Data from Robles et al Robles_12a. code
caption=
Figure 116. Survival curves of clones in the Linux high/medium/low level SCSI subsystems. Data from Wang Wang_12. code
caption=
Figure 117. Number of identifiers renamed, each month, in the source of Eclipse-JDT; version released on given date shown. Data from Eshkevari et al Eshkevari_11. code
caption=
Figure 118. Changes in the number of tables in the Mediawiki and Ensembl project database schema over time. Data from Skoulis Skoulis_13. code
caption=
Figure 119. Survival curve for tables in Wikimedia and Ensembl database schema. Data from Skoulis Skoulis_13. code

Projects

caption=
Figure 120. Percentage profit/loss on fixed-price software development contracts… Data extracted from … code
caption=
Figure 121. Commits within a particular hour and day of week for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code
caption=
Figure 122. Cone of uncertainty in estimated cost with constant accuracy and costs per time interval (top left), with 1% improvement in accuracy in each time interval (bottom left/right), and with three different spends per time interval, c(0.5*(0:30), 15+1.5*(1:30), 60+1:40), (right top/bottom). code
caption=
Figure 123. Estimated effort against actual effort (in hours). Data from Jørgensen Jorgensen_04b. code
caption=
Figure 124. Quoted bid price and estimated effort from 14 companies… Data from Anda et al Anda_09. code
caption=
Figure 125. Percentage difference in two estimates for the same six projects made by seven developers… Data from Grimstad et al Grimstad_07. code
caption=
Figure 126. Actual project duration against number of schedule estimates made for it. Data from Little Little_06. code
caption=
Figure 127. Distribution of effort (person hours) during the development of four engine control systems projects, plus non-project work and holidays, at Rolls-Royce. Data extracted from Powell Powell_01. code
caption=
Figure 128. Phase during which work on a given phase of development was actually performed. Data from Zelkowitz Zelkowitz_88. code
caption=
Figure 129. Average value assigned to requirements (red) and one standard deviation bands (blue) based on omitting one stakeholder’s value list. Data from Regnell et al Regnell_01. code
caption=
Figure 130. Number of requirements added/deleted/modified in 22 releases of a product containing eight features (upper) and total number of requirements against requirements changed for those eight features (lower). Data extracted from Felici Felici_04. code
caption=
Figure 131. Pagerank of the stakeholder nodes in the network created from the Open (green) and Closed (blue) stakeholder responses (values for each have been sorted). Data from Lim Lim_10. code
caption=
Figure 132. Average number of feature implementations started (blue) and their average duration (red); a 30 day rolling mean has been applied to both. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 133. Number of features whose implementation took a given number of elapsed workdays. Top first 650 days, bottom after 650 days. Green line is the fitted negative binomial distribution. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 134. Number of feature developments started on a given work day (red bug fixes, blue non-bug work, black ratio of two values; 20 day rolling mean bottom left, 50 day top right, 120 day bottom right). Data kindly supplied by 7Digital 7Digital_12. code

Reliability

caption=
Figure 135. Transition counts of the order in which five distinct faults were discovered in 50 runs of Program A2. Data from Nagel et al Nagel_82. code
caption=
Figure 136. Number of input cases that occurred before a particular fault was experienced by program A2; the list was sorted for each fault. Data from Nagel et al Nagel_82. code
caption=
Figure 137. Number of accesses to memory address blocks, per 100,000 instructions, executing gzip on two different inputs. Data from Brigham Young Brigham_Young via Feitelson. code
caption=
Figure 138. Number of reported incidents reported in each of 800 applications installed on over 120,000 desktop machines. Data from Lucente Lucente_15. code
caption=
Figure 139. Power analysis (50 and 10 runs at various p-values) of detecting a difference between two runs having a binomial distribution (runs needed to achieve power=0.8 at various p-values). code
caption=
Figure 140. Percentage of usability problems found by a given number of test subjects. Data extracted from Nielsen et al Nielsen_93. code
caption=
Figure 141. Problems reported in the POSIX standard by problem classification. Data kindly provided by Josey OpenGroup_17. code
caption=
Figure 142. Survival rate of faults in Linux device drivers and other Linux subsystems… Data from Palix et al Palix_10b. code
caption=
Figure 143. Defects found against hours of testing… Data from Wood Wood_96. code
caption=
Figure 144. Percentage of reported problems having a given mean time to first problem occurrence (in months, summed over all installations of a product) for none products. Data from Adams Adams_84. code
caption=
Figure 145. Survival curve of the two most common warnings reported by Splint in Samba and Squid. Data from De Penta et al Di_penta_09. code
caption=
Figure 146. Reported faults against number of installations (upper) and age (lower)… Data from the "wheezy" version of Debian UDD_14. code
caption=
Figure 147. Number of various kinds of fault found during code review of nine implementations of the same specification and how located. Data extracted from Finifter Finifter_13b. code
caption=
Figure 148. Input case on which a failure occurred, for a total of 500,000 inputs. Data from Dunham et al Dunham_86. code
caption=
Figure 149. Number of input cases processed before a given fault is experienced. Data from Dunham et al Dunham_86. code
caption=
Figure 150. Number of input cases processed before a given number of program failures is experienced; 25 replications. Data from Dunham et al Dunham_86. code
caption=
Figure 151. Time taken, in 10 distinct runs, to discover a thread safety violation in 22 different Java classes. Data kindly supplied by Pradel Pradel_12. code
caption=
Figure 152. Fraction of mutated programs, in various languages, that successfully compiled/executed/produced same output. Data from Spinellis et al Spinellis_12. code
caption=
Figure 153. Total number of failures per 30-day interval for each LANL system. Data from Los Alamos National Lab (LANL). code
caption=
Figure 154. Total number of failures for each node in the given LANL system. Data from Los Alamos National Lab (LANL). code
caption=
Figure 155. For systems 2 and 18, number of uptime intervals, binned into 10 hour intervals, red line is fitted negative binomial distribution. Data from Los Alamos National Lab (LANL). code
1014.png
Margin Fault slip throughs for a development project at Ericsson (left column list when fault could have been detected, bottom row when fault was detected). Data from Hribar Hribar_08. code
caption=
Figure 156. Various test suite coverage measures and mutants killed in 300 or so Java projects; black line is a loess fit. Data from Gopinath et al Gopinath_14. code
caption=
Figure 157. Statement (triangles) and branch (stars) coverage achieved using a program’s test suite… Data from Marinescu et al Marinescu_14. code
caption=
Figure 158. Amount of source (millions of lines) in each version broken down by the version in which it first appears. Data extracted Massacci et al Massacci_11. code
caption=
Figure 159. Market share of Firefox versions between official release and end-of-support. Data from w3schools.com. code
caption=
Figure 160. Number of people with Internet access per 100 head of population in the developed world and the whole world. Data from ITU ITU_12. code
caption=
Figure 161. Amount of end-user usage of code originally written for Firefox version 1.0 by various other versions. Data extracted from Massacci et al Massacci_11. code

Source code

caption=
Figure 162. Boxplot of ratings given to snippets 1 to 50 by second year students (colors used to help distinguish boxplots for each snippet). code
caption=
Figure 163. Aggregated ranking of snippets by subjects in years 1 and 2 (red and black) and years 2 and 4 (black and blue). Snippets have been sorted by year 2 ranking. code
caption=
Figure 164. Correlation, using Kendall’s tau, between each subject and their corresponding year aggregate ranking. code
caption=
Figure 165. Number of files and lines of code in 3,782 projects on Sourceforge. Data from Herraiz Herraiz_08. code
caption=
Figure 166. Total number of C functions measured, their total unused parameters and two fitted models. Data from Jones <book Jones_??>. code
caption=
Figure 167. Occurrences of sequences of java.lang.StringBuilder methods called on the same object in 11 GB of Java bytecode. Data from Mendez et al Mendez_13. code
caption=
Figure 168. For each class the percentage of method sequences containing a given number of calls (in 11 GB of Java bytecode). Data from Mendez et al Mendez_13. code
caption=
Figure 169. Number of commits of a given length, in lines added/deleted to fix various faults in Linux file systems. Data from Lu et al Lu_13. code
caption=
Figure 170. "Worth estimate" for identifier visibility ordering preferences declarations within a Java class. Data from Biegel et al Biegel_12. code
caption=
Figure 171. "Worth estimate" for the kind of method activity attribute. Data from Biegel et al Biegel_12. code
caption=
Figure 172. Number of method calls to Java APIs and non-APIs in 6,286 Open source projects. Data from Lämmel et al Lammel_11. code
caption=
Figure 173. Percentage occurrence of values appearing as the most significant digit of floating-point, integer and hexadecimal literals in C source code. Data from Jones Jones_05a. code
caption=
Figure 174. Lines of code, Halstead’s volume and cyclomatic complexity of Linux version 2.6.9. Data from Israel et al Israeli_10. code
caption=
Figure 175. Number of feature constants against LOC for 40 large C programs and two fitted regression lines (red and green; blue is one confidence interval). Data from Liebig et al Liebig_10. code

Stories told by data

caption=
Figure 176. Years of professional experience in a given language for experimental subjects. Data from Prechelt Prechelt_07. code
caption=
Figure 177. Plots of sample values having various visual patterns. code
caption=
Figure 178. Total number of lines of C code, in .c and .h files, having a given length, i.e., containing a given number of characters (upper) and tokens (lower). Data from Jones Jones_05a. code
caption=
Figure 179. Various measurements of work performed implementing the same functionality, number of lines of Haskell and C implementing functionality, CFP (COSMIC function points; based on user manual) and length of formal specification. Data kindly provided by Staples Staples_13. code
caption=
Figure 180. Effort, in hours (log scale), spent in various development phases of projects written in Ada (blue) and Fortran (red). Data from Waligora et al Waligora_95. code
caption=
Figure 181. Performance of experts (e) and novices (n) in a test driven development experiment. Data from Muller et al Muller_07. code
caption=
Figure 182. Correlations between pairs of attributes of 12,799 Github pull requests to the Homebrew repo, represented using colored ellipses. Data from Gousios et al Gousios_14. code
caption=
Figure 183. Correlations between pairs of attributes of 12,799 Github pull requests to the Homebrew repo, represented using pie charts and shaded boxes. Data from Gousios et al Gousios_14. code
caption=
Figure 184. Hierarchical cluster of correlation between pairs of attributes of 12,799 Github pull requests to the Homebrew repo. Data from Gousios et al Gousios_14. code
caption=
Figure 185. Effort invested in project definition (as percentage of original estimate) against cost overrun (as percentage of original estimate). Data extracted from Gruhl Gruhl_9x. code
caption=
Figure 186. Relative clock frequency of cpus when first launched (1970 == 1). Data from Danowitz et al Danowitz_12. code
caption=
Figure 187. Year and age at which survey respondents started contributing to FLOSS, i.e., made their first FLOSS contribution. Data from Robles et al Robles_14. code
caption=
Figure 188. SPECint results, summed over all distinct values (upper) and summed within equal width bins (lower). Data from SPEC website SPEC_14. code
caption=
Figure 189. Kernel density plot of the number of computers having the same SPECint result. Data from SPEC SPEC_14. code
caption=
Figure 190. Number of commits containing a given number of lines of code made when making various categories of changes to the Linux filesystem code (upper) and a density plot of the same data (lower). Data from Lu et al Lu_13. code
caption=
Figure 191. Three commonly used kernel density smoothing functions: gaussian, rectangular and triangular. code
caption=
Figure 192. Developer estimated effort against actual effort (in hours), for various maintenance tasks, e.g., adaptive, corrective and perfective; upper as-is, middle jittered values and lower size proportional to the log of the number measurements. Data from Hatton Hatton_07. code
caption=
Figure 193. Number of installations of Debian packages against the age of the package; middle plot was created by smoothScatter and lower plot by contour. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code
caption=
Figure 194. Number of lines added to glibc each week. Data from González-Barahona et al Gonzalez-Barahona_14. code
caption=
Figure 195. Boxplot of time between a bug in Eclipse being reported and the first response to the report; right plot is notched. Data from Breu et al Breu_10. code
caption=
Figure 196. Violin plots (left using vioplot, right using beanplot) of time between bug being reported in Eclipse and first response to the report. Data from Breu et al Breu_10. code
caption=
Figure 197. Time taken for developers to debug various programs using batch processing or online (i.e., time-sharing) systems. Data kindly provided by Prechelt Prechelt_99a. code
caption=
Figure 198. Pairs of languages used together in the same GitHub project with connecting line width, color and transparency related to number of occurrences. Data kindly supplied by Bissyande Bissyande_13. code
caption=
Figure 199. References from one document to another in the Microsoft Server Protocol specifications. Data extracted by the author from the 2009 document release WSPP_15. code
caption=
Figure 200. Alluvial plot of relative prioritization order of selection and application of Github pull requests. Data from Gousios et al Gousios_15a. code
caption=
Figure 201. Intel Sandy Bridge L3 cache bandwidth in GB/s at various clock frequencies and using combinations of cores (0-3 denotes cores zero-through-three, 0,2,4 denotes the three cores zero, two and four). Data from Schone et al Schone_12. code
caption=
Figure 202. Contour plot of the number of sessions executed on a computer having a given processor speed and memory capacity. Data kindly provided by Thereska Thereska_10. code
caption=
Figure 203. Root source of 1,257 faults and where fixes were applied for 21 large safety critical applications. Data from Hamill et al Hamill_14. code
caption=
Figure 204. Ternary plots drawn with two possible visual aids for estimating the position of a point (red plus at x=0.1, y=0.35, z=0.55); axis names appear on the vertex opposite the axis they denote. code
caption=
Figure 205. Earth relative positions of NASA’s Orbview-2 spacecraft when it experienced a single event upset (in blue) on 12 July 2000. Data kindly provided by LaBel Poivey_03. code
caption=
Figure 206. Estimated market share of Android devices by brand and product, based on downloads from 682,000 unique devices in 2015. Data from OpenSignal OpenSignal_15. code
caption=
Figure 207. Variables having a given number of read accesses, given 25, 50, 75 and 100 total accesses, calculated from running the weighted preferential attachment algorithm (red), the smoothed data (blue) and a fitted exponential (green). code
caption=
Figure 208. Throughput when running the SPEC SDM91 benchmark on a Sun SPARCcenter 2000 containing 8 CPUs, with the predictions from three fitted queuing models. Data from Gunther Gunther_05. code
caption=
Figure 209. Illustration of the difference in cognitive effort needed to locate points differing by shape or color. code
caption=
Figure 210. The three, seven and twelve color palettes returned by calls to the diverge_hcl, sequential_hcl, rainbow_hcl and rainbow functions. code
caption=
Figure 211. Percentage share of the Android market by successive Android releases between 2010 and 2015. Data from Villard Villard_15. code
caption=
Figure 212. Values plotted using a linear (upper) and logarithmic (lower) x-axis. Data from Dunham et al Dunham_86. code
caption=
Figure 213. Illustration of U-shape created when y-axis values are a ratio calculated from x-axis values. code
caption=
Figure 214. Mean time to fail for systems of various sizes (measured in lines of code); linear y-axis left, log y-axis right. Data extracted from Figure 8.3 of Putnam et al Putnam_92. code
caption=
Figure 215. Alternative representation of numeric values in Table. Data from Scott Scott_16. code
caption=
Figure 216. What’s up doc? Not the fitted model you were expecting. Equations from White White_12. code

Probability

caption=
Figure 217. Probability that three (red) or four (blue) consecutive false positive warnings occur in some total number of warnings (false positive rate appears on line). code
caption=
Figure 218. The relationship between words for tracts of trees in various languages. The interpretation given to words (boundary indicated by the zigzags) in one language may overlap that given in other languages. Adapted from DiMarco et al DiMarco_93.
caption=
Figure 219. Relationships between common discrete and continuous probability distributions.
caption=
Figure 220. Shapes of commonly encountered discrete probability distributions (upper to lower: Uniform, Geometric, Binomial and Poisson). code
caption=
Figure 221. Cumulative density plots of the discrete probability distributions in Figure. code
caption=
Figure 222. Commonly encountered continuous probability distributions (upper to lower: Uniform, Exponential, Normal, beta). code
caption=
Figure 223. Samples of randomly selected values drawn from the same normal distribution (left: 100 points in each sample, right 1,000 points in each sample). code
caption=
Figure 224. Reading rate for text printed using a serif (blue) and sans-serif (red) font, data has been normalised and displayed as a density. Data from Veytsman et al Veytsman_12. code
caption=
Figure 225. Probability, with 95% confidence, that shapiro.test correctly reports that samples drawn from various distributions are not drawn from a Normal distribution, and probability of an incorrect report when the sample is drawn from a Normal distribution. code
caption=
Figure 226. Number of conditionally compiled code sequences dependent on a given number of feature macros (red overwritten by blue: Linux, blue: FreeBSD). Data from Berger et al Berger_10. code
caption=
Figure 227. Percentage occurrence of statements for each of 100 or so C, C++ and Java programs, plotted as a density on the y-axis. Data from Zhu et al Zhu_15. code
caption=
Figure 228. A Cullen and Frey graph for the $3n+1$ program length data. Data kindly provided by van der Meulen van_der_Meulen_07. code
caption=
Figure 229. Number of 3n+1 programs containing a given number of lines and four distributions fitted to this data. Data kindly provided by van der Meulen van_der_Meulen_07. code
caption=
Figure 230. A zero-truncated Negative Binomial distribution fitted to the number of features whose implementation took a given number of elapsed workdays; first 650 days used. Data kindly provided by 7digital 7Digital_12. code
caption=
Figure 231. Density plot of MPI micro-benchmark runtime performance for calls to MPI_Scan with 10,000 Bytes (upper) and to MPI_Allreduce with 1,000 Bytes (lower). Data kindly supplied by Hunold Hunold_14. code
caption=
Figure 232. Mixture model fitted by the normalmixEM function to the performance data from calls to MPI_Allreduce. Data kindly supplied by Hunold Hunold_14. code
caption=
Figure 233. Density plot of accesses to one article on Slashdot, in minutes since its publication. The distinct Normal distributions (colored and fitted to the log of the data) contained in the mixture models fitted by the REBMIX (upper) and normalmixEM (lower) functions. Data kindly supplied by Kaltenbrunner Kaltenbrunner_07. code
caption=
Figure 234. Cumulative probability distribution of files size (red) and of number of bytes occupied in a file system (blue). Data from Irlam Irlam_93. code
caption=
Figure 235. Graph of available state transitions for Alaris volumetric infusion pump (the button presses that cause transitions between states are not shown). Data kindly supplied by Oladimeji Oladimeji_08. code
caption=
Figure 236. Discrete-time Markov chain for created/modified/deleted status of Linux kernel files at each major release from versions 2.6.0 to 2.6.39. Data from Tarasov Tarasov_12. code
caption=
Figure 237. Directed graph of emails between FreeBSD and OpenBSD developers, plus a few people involved in both discussions, with developers who sent/received less than four emails removed. Data from Canfora et al Canfora_11. code
caption=
Figure 238. Expected probability of a single instance (y-axis) against the probability of a measured struct type having grouped member types (x-axis); when both probabilities are the same points will be along the blue line. Data from Jones Jones_09b. code

Statistics for software engineering

caption=
Figure 239. Example of a sample drawn from a population. code
caption=
Figure 240. Date of introduction of a cpu against its commercial lifetime. Data from Culver Culver_10. code
caption=
Figure 241. A population of items having one of three colors and three strata sampled from it. code
caption=
Figure 242. Power consumed by three SERT benchmark programs at various levels of system load; crosses at 2% load intervals, lines based on 10% load intervals. Data kindly provided by Kistowski Kistowski_15. code
caption=
Figure 243. Distribution of 4,000 sample means for two sample sizes drawn from exponential (left), lognormal (center) and Pareto (right) distributions, vertical lines are 95% confidence bounds. The blue curve is the Normal distribution predicted by theory. code
caption=
Figure 244. Mean (red) and standard deviation (grey lines; they are not symmetrical because of the log scaling) of samples of 3 items drawn from a population of 1,000 items (blue line mean, green line standard deviation). Data kindly provided by Chen Chen_12. code
caption=
Figure 245. Density plot of mean of samples containing 3 or 12 items randomly selected from a data set of 1,000 items; process repeated 1,000 times for each sample size. Data kindly provided by Chen Chen_12. code
caption=
Figure 246. Number of commits to glibc for each day of the week, for the years from 1991 to 2012. Data from González-Barahona et al Gonzalez-Barahona_14. code
caption=
Figure 247. A Normal distribution with mean=4 and variance=8 and a Chi-squared distribution with four degrees of freedom having the same mean and variance (the vertical lines are at the distributions' median value). code
caption=
Figure 248. Density plot of execution time of 1,000 input data sets, with lines marking the mean, median and mode. Data kindly supplied by Chen Chen_12. code
caption=
Figure 249. Impact of serial correlation, AR(1) in this example, on the calculated mean (upper) and standard deviation (lower) of a sample (the legends specify the amount of serial correlation). code
caption=
Figure 250. Occurrence of sample median and mean values for 1,000 samples drawn from a binomial distribution. code
caption=
Figure 251. A contaminated normal, values drawn from two normal distributions with 10% of values drawn from a distribution having a standard deviation five times greater than the other. code
caption=
Figure 252. Regression model (red line; pvalue=0.02) fitted to the number of correct/false security code review reports made by 30 professionals; blue lines are 95% confidence intervals. Data from Edmundson et al Edmundson_13. code
caption=
Figure 253. Bootstrapped regression lines fitted to random samples of the number of correct/false security code review reports made by 30 professionals. Data from Edmundson et al Edmundson_13. code
caption=
Figure 254. Kernel density plot, with 95% confidence interval, of the number of computers having the same SPECint result. Data from SPEC SPEC_14. code
caption=
Figure 255. The four related quantities in the design of experiments. code
caption=
Figure 256. Examples of the impact of population prevalence, statistical power and p-value on number of false positives and false negatives. code
caption=
Figure 257. Visualization of Cohen’s $d$ for two normal distributions having different means and the same standard deviation (two left) and both different (right). code
caption=
Figure 258. The impact of differences in mean and standard deviation on the overlap between two populations ($\alpha$: probability of making a false positive error, and $\beta$: probability of making a false negative error). code
caption=
Figure 259. The power of a statistical test at detecting that a difference exists between the mean value of two sample drawn from two populations, both having a Normal distribution. code

Regression modeling

caption=
Figure 260. Relationship between data characteristics (edge labels) and applicable techniques (node labels) for building regression models.
caption=
Figure 261. Total lines of source code in FreeBSD by days elapsed since the project started (in 1993). Data from Herraiz Herraiz_08. code
caption=
Figure 262. Estimated cost and duration of 73 large Dutch federal IT projects, along with fitted model and 95% confidence intervals. Data from Kampstra et al Kampstra_09. code
caption=
Figure 263. Number of updates and fixes in each Linux release between version 2.6.11 and 3.2. Data from Corbet et al Corbet_12. code
caption=
Figure 264. The number of commits made and the number of contributing developers for Linux versions 2.6.0 to 3.12. The green line in the right plot is the regression model fitted by switching the x/y values. Data from Kroah-Hartman Kroah-Hartman_14. code
caption=
Figure 265. Effort/Size of various projects and regression lines fitted using Effort as the response variable (red, with green 95% confidence intervals) and Size as the response variable (blue). Data from Jørgensen et al <book Jorgensen_0?>. code
caption=
Figure 266. Lines of code in every initial release (i.e., excluding bug-fix versions of a release) of the Linux kernel since version 1.0, along with fitted straight line (upper) and quadratic (lower) regression models. Data from Israeli et al Israeli_10. code
caption=
Figure 267. Actual (left of vertical line) and predicted (right of vertical line) total lines of code in Linux at a given number of days since the release of version 1.0, derived from a regression model built from fitting a cubic polynomial to the data (dashed lines are 95% confidence bounds). Data from Israeli et al Israeli_10. code
caption=
Figure 268. Number of classes in the Groovy compiler at each release, in days since version 1.0. Data From Vasa Vasa_10. code
caption=
Figure 269. For each distinct language, the number of lines committed on Github and the number of questions tagged with that language. Data from Kunst Kunst_13. code
caption=
Figure 270. Percentage of vulnerabilities detected by developers working a given number of years in security. Data extracted from Edmundson et al Edmundson_13. code
caption=
Figure 271. Hours to develop software for 29 embedded consumer products and the amount of code they contain, with fitted regression model and loess fit (yellow). Data from Fenton el al Fenton_08. code
caption=
Figure 272. Points remaining after removal of overly influential observations, repeatedly applying Cook’s distance and Studentized residuals. Data from Fenton el al Fenton_08. code
caption=
Figure 273. Points remaining after removal of overly influential observations, also taking into account the Bonferroni p-value of the Studentized residuals; the line shows the fitted model and 95% confidence interval (loess fit in yellow). Data from Fenton el al Fenton_08. code
caption=
Figure 274. influenceIndexPlot for the model having the fitted line shown in Figure. Data from Fenton el al Fenton_08. code
caption=
Figure 275. Number of medical devices reported recalled by the US Food and Drug Administration, in two week bins. Upper: fitted straight line and confidence bounds, with loess fit (green); Lower: straight line (purple) fitted after two outliers replaced by mean and original fit (red). Data from Alemzadeh et al Alemzadeh_13. code
caption=
Figure 276. influenceIndexPlot of data from Alemzadeh et al Alemzadeh_13. code
caption=
Figure 277. Two fitted straight lines and confidence intervals, one up to the end of 2010 and one after 2010. Data from Alemzadeh et al Alemzadeh_13. code
caption=
Figure 278. Results from various studies of software requirements function points counted using COSMIC and FPA; lines are loess fits to studies based on industry and academic counters. Data from Amiri et al Amiri_11. code
caption=
Figure 279. Five different equations fitted to the Embedded subset of the COCOMO 81 data before influential observation removal (upper) and after influential observation removal (lower). Data from Boehm Boehm_81. code
caption=
Figure 280. Anscombe data sets with Pearson correlation coefficient, mean, standard deviation, and line fitted using linear regression. Data from Anscombe Anscombe_73. code
caption=
Figure 281. Residual of the straight line fit to the Linux growth data analysed in Figure (upper) and data+straight line fit (red) and loess fit (blue). Data from Israeli et al Israeli_10. code
caption=
Figure 282. Array element assignment benchmark compiled with gcc using the O0 (upper) and O3 (lower) options (measurements were grouped into runs of 2,000 executions). Data from Flater et al Flater_13. code
caption=
Figure 283. Number of installations of Debian packages against the age of the package, plus fitted model and loess fit. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code
caption=
Figure 284. Quadratic relationship with various amounts of added noise fitted using a quadratic and exponential model. code
caption=
Figure 285. Author workload against number of activity types per author (upper) and ratio test (lower). Data from Vasilescu et al Vasilescu_12. code
caption=
Figure 286. Change points detected by cpt.mean, upper using method="AMOC" and lower using method="PELT". Data from Alemzadeh et al Alemzadeh_13. code
caption=
Figure 287. Number of flags (y-axis jittered) used to control the selection of optional features in system containing a total number of features, loess curve (red), regression line (green). Data from Berger et al Berger_12. code
caption=
Figure 288. Monthly unit sales (in thousands) of 4-bit microprocessors. Data kindly supplied by Turley Turley_02. code
caption=
Figure 289. Fitted regression line to points (in red) and 3-D illustration of assumed Normal distribution of errors. code
caption=
Figure 290. Number of vulnerabilities detected by professional developers with web security review experience; upper: technically correct plot of model fitted using a Poisson distribution, lower: easier to interpret curve representation of fitted regression models assume error has a Poisson distribution (continuous lines) or a Normal distribution (dashed lines). Data extracted from Edmundson Edmundson_13. code
caption=
Figure 291. Number of functions containing a given number of break statements and a fitted Negative Binomial distribution. Data from Jones Jones_05a. code
caption=
Figure 292. Code review meeting duration for a given number of non-comment lines of code. Fitted regression model, assuming errors have a Gamma distribution (red, with confidence interval in blue) or a Normal distribution (green). Data from Porter et al Porter_98. code
caption=
Figure 293. Number of APIs used in Java programs containing a given number of lines and three fitted models. Data from Starek Starek_10. code
caption=
Figure 294. Yearly development cost and line of Fortran code delivered to the US Air Force between 1962 and 1984; with fitted regression models. Data extracted from NeSmith NeSmith_86. code
caption=
Figure 295. Maintenance task effort and lines of code added+updated, with fitted regression model (red) and SIMEX adjusted for 10% error (blue). Data from Jørgensen Jorgensen_95. code
caption=
Figure 296. Regression modeling 0/1 data with a straight line and a logistic equation. code
caption=
Figure 297. ROC curve for the data listed in Table. code
caption=
Figure 298. Percentage of mutants killed at various percentage of path coverage for 300 or so Java projects; fitted Beta (red) and glm (blue) regression models. Data from Gopinath et al Gopinath_14. code
caption=
Figure 299. SPECint 2006 performance results for processors running at various clock rates, memory chip frequencies and processor family. Data from SPEC SPEC_14. code
caption=
Figure 300. Component+residual plots for three explanatory variables in a fitted SPECint model. code
caption=
Figure 301. Individual contribution of each explanatory variable to the response variable in a quadratic model of SPECint performance. code
caption=
Figure 302. Estimated and actual effort broken down by communication frequency, along with individually fitted straight lines. Data from Moløkken-Østvold et al Molokken_Ostvold_07. code
caption=
Figure 303. Illustration of the shared and non-shared contributions made by two explanatory variables to the response variable Y. code
caption=
Figure 304. pairs plot of lines added/modified/removed, growth and number of files and total lines in versions 2.6.0 through 3.9 of the Linux kernel. Data from Kroah-Hartman Kroah-Hartman_14. code
caption=
Figure 305. Example plots of functions listed in Table. These equations can be inverted, so they start high and go down. code
caption=
Figure 306. Time to execute a computational biology program on systems containing processors with various L2 cache sizes. Data kindly provided by Hazelhurst Hazelhurst_10. code
caption=
Figure 307. A logistic equation fitted to the lines of code in every non-bugfix release of the Linux kernel since version 1.0. Data from Israel et al Israeli_10. code
caption=
Figure 308. Predictions by logistic equations fitted to Linux SLOC data, using subsets of data up to 2900, 3650, 4200 number of days and all days since the release of version 1.0. Data from Israel et al Israeli_10. code
caption=
Figure 309. Increase in areal density of hard disks entering production over time. Data from Grochowski et al Grochowski_12. code
caption=
Figure 310. Lines of code in the GNU C library against days since 1 January 1990. Data from González-Barahona Gonzalez-Barahona_14. code
caption=
Figure 311. Number of failing programs caused by unique faults in gcc (upper) and SpiderMonkey (lower). Fitted model in green, with two exponential components in red and blue. Data kindly provided by Chen Chen_13. code
caption=
Figure 312. Power law (red) and exponential (blue) fits to feature macro usage in 20 systems written in C; fail to reject p-value for 20 systems is 0.64. Data from Queiroz et al Queiroz_17. code
caption=
Figure 313. Power consumption of six different Intel Core i5-540M processors running at various frequencies; colored lines denote fitted regression models for each processor. Data from Balaji et al Balaji_12. code
caption=
Figure 314. Example showing the three ways of structuring a mixed effects model, i.e., different intersections/same slope (upper), same intersection/different slopes (middle) and different intersections/slopes (lower). code
caption=
Figure 315. Confidence intervals, 95%, for within-subject intercept and slope (right plots) of mixed-effect models in the adjacent code. code
caption=
Figure 316. The three components of the hourly rate of commits, during a week, to the Linux kernel source tree; components extracted from the time series by stl. Data from Eyolfson et al Eyolfson_11. code
caption=
Figure 317. Autocorrelation of number of defects found on a given day, for development project C. Data kindly provided by Buettner Buettner_08. code
caption=
Figure 318. Autocorrelation of two AR models (upper plots) and two MA models (lower plots). code
caption=
Figure 319. Partial autocorrelation of same two AR models (upper plots) and two MA models (lower plots) shown in Figure. code
caption=
Figure 320. Autocorrelation of indentation of source code written in various languages. Data from Hindle et al Hindle_08. code
caption=
Figure 321. Number of features started for each day and fitted regression trend line (left) and number of features after subtracting the trend (right), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 322. Autocorrelation (left) and partial autocorrelation (right) of the number of features started on a given day (after differencing the log transformed data), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 323. Predicted daily difference in the number of new feature starts (red) and 95% confidence intervals (blue). Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 324. Time series whose values are uncorrelated (upper), but whose squared values are correlated (lower); see code for generation process. code
caption=
Figure 325. Cross correlation of feature release ‘size’ (upper non-bugfix releases, lower all releases) and date when bugs are prioritised. Data kindly supplied by 7Digital 7Digital_12. code
caption=
Figure 326. Estimated staff working on a project during every week. Data from Buettner Buettner_08. code
caption=
Figure 327. Market share of Firefox version 3.0 fitted using loess regression with various values of the span option. Data from W3Counter W3Counter_14. code
caption=
Figure 328. Cross-correlation of source lines added/deleted per week to the glibc library. Data from González-Barahona Gonzalez-Barahona_14. code
caption=
Figure 329. Visualization of alignment between weekly time series of lines code in NetBSD (blue) and FreeBSD (red). Data from Herraiz Herraiz_08 code
caption=
Figure 330. Effort distribution (person hours) over the eight main tasks of a development project at Rolls-Royce and a hierarchical clustering of each task effort time series based on pair-wise correlation and Euclidean distance metrics. Data extracted from Powell Powell_01. code
caption=
Figure 331. Two commonly used hazard functions; Weibull is monotonic (always increases, decreases or remains the same) and Lognormal which can increase and then decrease. code
caption=
Figure 332. Observation period with events inside and outside the study period. code
caption=
Figure 333. The Kaplan-Meier curve for survivability of new releases: (blue) ETPs using only official APIs, (blue) ETPs calling internal APIs (red); dotted lines are 95% confidence intervals. Data from Businge Businge_13. code
caption=
Figure 334. The Kaplan-Meier curve for survivability of ETPs ability to be built using SDK released in subsequent years: (blue) ETPs using only official APIs, (red) ETPs calling internal APIs; dotted lines are 95% confidence intervals, with plus signs, +, indicating censored data. Data from Businge Businge_13. code
caption=
Figure 335. Kaplan-Meier curves for time-to-fix…. Data from Arora et al Arora_10. code
caption=
Figure 336. Survival curve after adjustment for explanatory variables… code
caption=
Figure 337. Cumulative incidence curves for problems reported by the splint tool in Samba and Squid (time is measured in number of snapshot releases). Data from Di Penta et al Di_penta_09. code
caption=
Figure 338. Rose diagram of number of commits in each 3 hour period of a day for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code
caption=
Figure 339. The Cartwright (red; dcarthwrite), wrapped Cauchy (green; dwrappedcauchy) and wrapped von Mises (blue; dvonmises) circular probability distributions for various values of their parameters. code
caption=
Figure 340. Asymmetric extended wrapped forms of the Cardioid (upper), von Mises (middle) and Cauchy (lower) probability distributions for various values of their parameters. code
caption=
Figure 341. Number of commits (upper) and number of commits in which a fault was detected (lower) by hour of day of the commit, for Linux. Data from Eyolfson et al Eyolfson_14. code
caption=
Figure 342. Number of commits per hour for weekdays and fitted model (upper) and number of commits in which a fault was detected (lower), for Linux. Data from Eyolfson et al Eyolfson_14. code
caption=
Figure 343. Number of commits per hour for each weekday, fitted using $\cos(...\cos...)$ (upper) and $\cos(...\cos+\sin...)$ (lower), for Linux; in both cases the fitted fault model (red) has been rescaled to allow comparison. Data from Eyolfson et al Eyolfson_14. code
caption=
Figure 344. Application source lines against percentage of covered lines achieved by both Human & Dynodroid tests, by only by Dynodroid tests and only by Human tests. Data from Machiry et al Machiry_13. code
caption=
Figure 345. Percentage of source lines covered by both Human & Dynodroid tests, by only by Dynodroid tests and only by Human tests; fitted regression line and prediction points for various total source lines, red plus. Data from Machiry et al Machiry_13. code

Other techniques

caption=
Figure 346. Volume of unit sphere in 1 to 50 dimensions, e.g., sphere has volume $\frac43pi$ in three dimensions. code
caption=
Figure 347. Top levels of the decision tree built from the reopened fault data. Data from Shihab et al Shihab_10a. code
caption=
Figure 348. A Bertin plot for items included in the same data structure as ‘Antibiotics used’, for each subject, after reordering by seriate. Data from Jones Jones_09b. code
caption=
Figure 349. A visualization of the Robinson matrix based on number of times pairs of items co-occur in the same data structure (the closer to the diagonal the more often they occur together). Data from Jones Jones_09b. code

Experiments

caption=
Figure 350. Time taken, by the same person, to implement 12 algorithms from the Communications of the ACM, with four iteration of the implementation process. Data from Zislis Zislis_73. code
caption=
Figure 351. Time taken to transfer and multiply 2-dimensional matrices of various sizes on a GTX 480 GPU. Data kindly supplied by Gregg and Hazelwood Gregg_11. code
caption=
Figure 352. Relative performance (y-axis) of libraries optimized to run on various processors (x-axis). Data from Bird Bird_10. code
caption=
Figure 353. Number of integer constants having the lexical form of a decimal-constant (the literal 0 is also included in this set) and hexadecimal-constant that have a given value. Data from Jones Jones_05a. code
caption=
Figure 354. One and two-sided significance testing. code
caption=
Figure 355. A cube plot of three configuration factors and corresponding benchmark results (blue) from Memory table experiment. Data from Citron et al Citron_03b. code
caption=
Figure 356. Design plot showing the impact of each configuration factor on the performance of Memo table on benchmark performance. Data from Citron et al Citron_03b. code
caption=
Figure 357. Interaction plot showing how cint changes with size for given values of associativity and mapping. Data from Citron et al Citron_03b. code
caption=
Figure 358. Number of Reflection benchmark results achieving a given score, reported for GTX 970 cards from three third-party manufacturers. Data extracted from UserBenchmark.com. code
caption=
Figure 359. Density plots of project bids submitted by companies before/after see a requirements document. Data from Jørgensen et al Jorgensen_04c. code
caption=
Figure 360. Density plot of task implementation estimates: with no instructions (red) and with instruction on what to do (blue). Data from Jørgensen el al Jorgensen_04. code
caption=
Figure 361. Examples of correlation between samples of two value pairs, plotted on x and y axis. code
caption=
Figure 362. Number of software faults having a given consequence, based on an analysis of faults in Cassandra. Data from Gunawi et al Gunawi_14. code
caption=
Figure 363. Performance and rental cost of early computers, with straight line fits for a few years. Data from Knight Knight_66. code
caption=
Figure 364. Feature size, in Silicon atoms, of microprocessors. Data from Danowitz et al Danowitz_12. code
caption=
Figure 365. Maximum number of records sorted in 1 minute and using 1 penny’s worth of system time (upper). SPEC2006 integer benchmark results (lower). Data from Gray et al Gray_14 and SPEC SPEC_14. code
caption=
Figure 366. Total system power consumed when sorting 10, 20, 30, 40, 50 million integers (colored pluses) using three techniques running on the same processor at different clock frequencies. Data from Götz et al Gotz_14. code
caption=
Figure 367. Power consumed by 10 Amtel SAM3U microcontrollers at various temperatures when sleeping or running. Data from Wanner et al Wanner_10. code
caption=
Figure 368. Power spectrum of electrical power consumed by an app running on a ???. Data from Saborido et al Saborido_15. code
caption=
Figure 369. Read bandwidth at various offsets for new disks sold in 2002 (upper) and 2006 (lower). Data kindly provided by Krevat Krevat_13. code
caption=
Figure 370. Average power consumed by one server’s CPU (four Pentium 4 Xeons; red) and memory (8 GB PC133 DIMMs; blue) running the SPEC CPU2006 benchmark (upper) and breakdown by system component when executing various programs. Data from Bircher Bircher_10. code
caption=
Figure 371. FFT benchmark executed 2,048 times followed by system reboot, repeated 10 times. Data kindly provided by from Kalibera_05. code
caption=
Figure 372. Percentage change, relative to no environment variables, in perlbench performance as characters are added to the environment. Data extracted from Mytkowicz et al Mytkowicz_08. code
caption=
Figure 373. Changes in SPEC CPU2006 benchmark performance caused by cache and memory bus contention for one dual processor Intel Xeon E5345 system. Data kindly provided by Babka Babka_12. code
caption=
Figure 374. Execution time of 330.art_m, an OpenMP benchmark program, using different compilers, number of threads and setting of thread affinity. Data kindly provided by Mazouz Mazouz_13. code
caption=
Figure 375. Access times when walking through memory using three fixed stride patterns (i.e., 32, 64 and 128 bytes) on a quad-core Intel Xeon E5345; grey lines at one standard deviation. Data kindly provided by Babka Babka_09. code
caption=
Figure 376. Performance variation of programs from the Talos benchmark run on original OS and a stabilised OS. Data from Larres Larres_12. code
caption=
Figure 377. Operations per second of a file-sever mounted on one of ext2, ext3, rfs and xfs filesystems (same color for each filesystem) using various options. Data kindly supplied by Huang Zhou_12. code
caption=
Figure 378. Percentage change in SPEC number, relative to version 4.0.4, for 12 programs compiled using six different versions of gcc (compiling to 64-bits with the O3 option). Data from Makarow Makarow_14. code
caption=
Figure 379. Execution time of xy file compressor, compiled using gcc using various optimization options, running on various systems (lines are mean execution time when compiled using each option). Data kindly supplied by Petkovich de_Oliveira_13. code
caption=
Figure 380. Execution time of Perlbench, from SPEC benchmark, on six systems, when linked in three different orders and address randomization on/off. Data kindly supplied by Reidemeister de_Oliveira_13. code
caption=
Figure 381. Performance of PassMark memory benchmark on 783 Intel Core i7-3770K systems; lower plot created by trimming 10% of values from the ends of what appears in the upper plot. Data kindly supplied by David Wren PassMark_14. code
caption=
Figure 382. Ubench cpu performance on small (upper) and large (lower) EC2 instances, Europe in red and US in green. Data kindly provided by Dittrich Schad_10. code
caption=
Figure 383. Lines of code that 101 professional developers, with a given number of years experience, estimate they have written. Data from Jones Jones_06aJones_08aJones_09b. code

Overview of R

caption=
Figure 384. Plot produced by hello_world.R program. code
caption=
Figure 385. The unique bytes per window (256 bytes wide) of a pdf file. code

Data preparation

caption=
Figure 386. Screen height and width reported by 682,000 unique devices that downloaded an App from OpenSignal in 2015 (upper), reported measurements ordered so height always the larger value (lower). Data from OpenSignal OpenSignal_15. code
caption=
Figure 387. Number of reported vulnerabilities, per day, in the US National Vulnerability Database for 2003. Data from the National Vulnerability Database NVD_14. code
caption=
Figure 388. Percentage occurrence of the first digit of hexadecimal numbers in C source and estimated from Google book data. Data from Jones Jones_05a and Michel et al Michel_11. code
caption=
Figure 389. Number of processes executing for a given amount of time, with measurements expressed using two and six significant digits. Data from Feitelson Feitelson_14. code