Introduction

Figure 1. Total cost of one million computing operations over time. Data from Nordhaus Nordhaus_01. code
Figure 2. Storage cost, in US dollars per Mbyte, of mass market technologies over time. Data from McCallum McCallum_16, floppy and CD-ROM data kindly provided by Davis Davis_01. code
Figure 3. Initial growth of time-sharing systems available in the US. Data extracted from Glauthier Glauthier_67. code
Figure 4. Growth of transport and product distribution infrastructure in the USA (underlying data is measured in miles). Data from Grübler et al Grubler_91. code
Figure 5. Market capitalization of IBM, Microsoft and Apple (upper), and expressed as a percentage of the top 100 listed US tech companies (lower). Data extracted from the Economist website Economist_15. code
Figure 6. Total annual sales of computer species over the last 60 years. Data from Gordon Gordon_87 (mainframes and minicomputers), Reimer Reimer_12 (PCs) and Gartner Gartner_17 (smartphones). code
Figure 7. Power consumed, in Watts, executing an instruction on a computer available in a given year. Data from Koomey et al Koomey_11. code
Figure 8. Total investment in tangible and intangible assets by UK companies, based on their audited accounts. Data from Goodridge et al Goodridge_14. code
Figure 9. Billions of dollars of worldwide semiconductor sales per month. Data from World Semiconductor Trade Statistics WSTs_16. code
Figure 10. Smaller size allows more devices to be fabricated on the same slice of silicon, plus material defects impact a lower percentage of devices. code
Figure 11. Spectral analysis of World GDP between 1870-2008; peaks around 17 and 70 years. Data from Maddison Maddison_91. code
Figure 12. Changing habits in men’s facial hair. Data from Robinson Robinson_76. code
Figure 13. Number of papers, in each year between 1987 and 2003, associated with a particular IT topic. The E-commerce paper count peaks at 1,775 in 2000 and in 2003 is still off the scale compared to other topics. Data kindly provided by Wang Wang_10. code
Figure 14. Normal distribution with total percentage of values enclosed within a given number of standard deviations. code
Figure 15. Example convex, upper, and concave, lower, functions; lines are three chords of the function. code

Human cognitive characteristics

Figure 16. Unless cognition and the environment in which it operates closely mesh together, no problems are solved; the blades of a pair of scissors need to closely mesh for cutting to occur. code
Figure 17. The assumption of light shining from above creates the appearance of bumps and pits. code
Figure 18. The assumption of light shining from above creates the appearance of bumps and pits. code
Figure 19. Probability that rat N1 will press a lever a given number of times before pressing a second lever to obtain food, when the target count is 4, 8, 12 and 16. Data extracted from Mechner Mechner_58. code
Figure 20. Boy/girl (aged 11-12 years) verbal reasoning, quantitative reasoning, non-verbal reasoning and mean CAT score over the three tests; each stanine band is 0.5 standard deviations wide. Data from Strand et al Strand_06. code
Figure 21. Rotate text in the real world, by tilting the head, or in the mind? code
Figure 22. Two objects paired with another object that may be a rotated version. Based on Shepard et al Shepard_71. code
Figure 23. Error rate, with standard error, for the left/right hand in a study of the SNARC effect. Data from Nuerk et al Nuerk_05. code
Figure 24. Structure of mammalian long-term memory subsystems; brain areas in red. Based on Squire et al Squire_15.
Figure 25. Percentage occurrence of binary operator pairs (as a percentage of all such pairs) against fraction of correct answers to questions about their precedence, red line is beta regression model, plus bootstrapped 95% confidence intervals. Data from Jones Jones_06a. code
Figure 26. Response time (left axis) and error percentage (right axis) on reasoning task with given number of digits held in memory. Data extracted from Baddeley Baddeley_09. code
Figure 27. Major components of working memory: working memory in yellow, long-term memory in orange. Based on Baddeley Baddeley_12. code
Figure 28. Yes/no response time (in milliseconds) as a function of the number of digits held in memory. Data extracted from Sternberg Sternberg_69. code
Figure 29. Parse tree of a sentence with no embedding, upper "S 1", and a sentence with four degrees of embedding, lower "S 4". Based on Miller et al Miller_64. code
Figure 30. Sequencing errors (as percentage) after interruptions of various length (red), including 95% confidence intervals, normal sequence error rate in green; lines are fitted model predictions. Data from Altmann et al Altmann_17. code
Figure 31. Semantic memory representation of alphabetic letters (the numbers listed along the top are place markers and are not stored in subject memory). Readers may recognize the structure of a nursery rhyme in the letter sequences. Derived from Klahr Klahr_83. code
Figure 32. Probability of correct recall of words by serial presentation order (each word visible for 1 or 2 seconds, last digit in legend). Data extracted from Murdoch Murdoch_62. code
Figure 33. Time taken to solve the same jig-saw puzzle 35 times, followed by a two-week interval and then another 35 times, with power law and exponential fits. Data extracted from Alteneder Alteneder_35. code
Figure 34. Completion times of eight solo developers for each implementation round. Data kindly provided by Lui Lui_06. code
Figure 35. Time taken, by the same person, to implement 12 algorithms from the Communications of the ACM (each colored line), with four iteration of the implementation process. Data from Zislis Zislis_73. code
Figure 36. Time taken by 24 subjects, classified by years of professional experience, to complete successive tasks. Data from Latorre Latorre_14. code
Figure 37. Subjects belief response curves for positive weak&endash; strong, negative weak&endash; strong, and positive&endash; negative evidence. Based on Hogarth et al Hogarth_92. code
Figure 38. Country boundaries distort judgement of relative city locations. Based on Stevens et al Stevens_78.
Figure 39. Orthogonal representation of shape, color and size stimuli. Based on Shepard Shepard_61.
Figure 40. The six unique configurations of selecting four times from eight possibilities, i.e., it is not possible to rotate one configuration into another within these six configurations. Based on Shepard Shepard_61.
Figure 41. Percentage of correct answers given by one subject, against boolean-complexity of category, colored by number of positive cases needed to define the category. Data kindly provided by Feldman Feldman_00. code
Figure 42. The Berlin and Kay Berlin_69 language color hierarchy. The presence of any color term in a language implies the existence, in that language, of all terms below it. Papuan Dani has two terms (black and white), while Russian has eleven (Russian may also be an exception in that it has two terms for blue.) code
Figure 43. Cup- and bowl-like objects of various widths (ratios 1.2, 1.5, 1.9, and 2.5) and heights (ratios 1.2, 1.5, 1.9, and 2.4). The percentage of subjects who selected the term cup or bowl to describe the object they were shown (the paper did not explain why the figures do not sum to 100%). Based on Labov Labov_73. code
Figure 44. A commercial event involving a buyer, seller, money, and goods; as seen from the buy, sell, pay, or charge perspective. Based on Fillmore Fillmore_77. code
Figure 45. Lines of code correctly recalled after a given number of 2 minute memorization sessions; upper plot actual program, lower plot line order scrambled. Data extracted from McKeithen et al McKeithen_81. code
Figure 46. Examples of features that may be preattentively processed (parallel lines and the junction of two lines are the odd ones out). Based on Ware Ware_00.
Figure 47. Continuity&emdash; upper left plot is perceived as two curved lines; Closure&emdash; when the two perceived lines are joined at their end (upper right), the perception changes to one of two cone-shaped objects; Symmetry and parallelism&emdash; where the direction taken by one line follows the same pattern of behavior as another line; Proximity&emdash; the horizontal distance between the dots in the lower left plot is less than the vertical distance, causing them to be perceptually grouped into lines (the relative distances are reversed in the right plot); Similarity&emdash; a variety of dimensions along which visual items can differ sufficiently to cause them to be perceived as being distinct; rotating two line segments by 180°ree; does not create as big a perceived difference as rotating them by 45°ree;; TODO look good. code
Figure 48. Perceived grouping of items on a line may be by shape, color or proximity. Based on kubovy et al kubovy_08. code
Figure 49. Examples of unique items among visually similar items. Those in the left column include an item that has a distinguishing feature (a vertical line or a gap); those in the right column include an item that is missing a distinguishing feature. Based on displays used by Treisman et al Treisman_85. code
Figure 50. The foveal, parafoveal and peripheral vision regions when three characters visually subtend 3°ree;. Based on Schotter et al Schotter_12. code
Figure 51. Local context can change the interpretation given to the surrounding symbols. code
Figure 52. Example object layout and the corresponding ordered tree produced from the answers given by one subject. Data extracted from McNamara et al McNamara_89. code
Figure 53. Hierarchical clustering of statement recall order, averaged over teachers and then students; label names are: program_list-statementkind, where statementkind might be a function header, loop, etc. Data extracted from Adelson Adelson_81. code
Figure 54. Heat map of one subject’s cumulative fixations (black dots) on a screen image. Data kindly provided by Ali Ali_12. code
Figure 55. The four cards used in the Wason selection task. Based on Wason Wason_68. code
Figure 56. Average time (in milliseconds) taken for subjects to enumerate O’s in a background of X or Q distractors. Based on Trick and Pylyshyn Trick_93. code
Figure 57. Probability a subject will successfully distinguish a difference between the number of dots displayed and a specified target number (x-axis is the difference between these two values). Data extracted from van Oeffelen et al van_Oeffelen_82. code
Figure 58. Line locations chosen for the numeric values seen by each of four subjects; color of fitted loess line changes at one million boundary. Data kindly provided by Landy Landy_17. code
Figure 59. Number of errors, in 132 simple multiplication trials (e.g., $3\cdot7$), upper plot shows operand values (a loess fit in yellow) and lower plot result value (points where both operands have the same value are in blue). Data from Campbell Campbell_97. code
Figure 60. One subject’s response time over successive blocks of command line trials and fitted loess (in green). Data kindly provided by Remington Remington_16. code
Blah… Data from Stewart et al Stewart_15. code
echo=FALSE,results=hide,label=Stewart_analysis,fig=TRUE,align="center">>
Figure 61. Each row shows a scaled version of the three stripes, along with actual lengths in inches, from which subjects were asked to select the longest. Based on Asch Asch_56. code
Figure 62. Risk neutral (green, $u(w)=w$), risk loving (red, quadratic) and risk averse (blue, square-root) utility functions. code
Figure 63. Subjects' estimate of their ability (x-axis) to correctly answer a question and actual performance in answering on the left scale. The responses of a person with perfect self-knowledge is given by the solid line. Data extracted from Lichtenstein et al Lichtenstein_77. code
Figure 64. Perceived present value (moving through time to the right) of two future rewards. code
Figure 65. The five possible ways in which experimenter’s rule and subject’s rule hypothesis can overlap, in the space of all possible rules; based on Klayman et al Klayman_87. code

Cognitive capitalism

Figure 66. Company revenue ($millions) against total software development costs. Data from Mulford et al Mulford_16. code Figure 67. Average Return On Invested Capital of various U.S. industries between 1992-2006. Data from Porter Porter_08. code Figure 68. Development cost (adjusted to 2018 dollars) of computer video games, whose cost was more than$50million. Data from Wikipedia Wiki_Games_18. code
Figure 69. Ratio of actual to estimated hours of effort to enhance an existing product, for 25 versions of one application. Data from Huijgens et al Huijgens_16. code
Figure 70. Accounting practice for breaking down income from sales… code
Figure 71. Average effort (in days) used to fix a defect detected in a given phase (x-axis) that had been introduced in an earlier phrase (colored lines), introduced in an earlier phase (total of 38,120 defects in projects at Hughes Aircraft). Data extracted from Willis et al Willis_98. code
Figure 72. Months of developer effort needed to produce systems containing a given number of lines of code, for various application domains. Data from Gayek et al Gayek_04. code
Figure 73. Introductory price and performance (measured using wPrime32 benchmark) of various Intel processors between 2003-2013. Data from Sun Sun_14. code
Figure 74. Example supply and demand curves. code
Figure 75. Rates at which product sales are made on Gumroad at various prices; lines join prices that differ in 1¢s;, e.g., $1.99 and$2. Data from Nichols Nichols_13. code
Figure 76. Growth of Github users during its first 58 months. Data from Irving Irving_16. code
Figure 77. Sales of game software (solid lines) for the corresponding three major seventh generation hardware consoles (dotted lines). Data from VGChartz VGChartz_17. code
Figure 78. Percentage of sales closed in a given week of a quarter, with average discount given. Data from Larkin Larkin_13. code
Figure 80. Top 100 software companies ranked by total revenue (in millions of dollars) and ranked by Software-as-a-Service revenue. Data from PwC PwC_13PwC_14PwC_16. code
Figure 81. Various vendor’s retail price and upgrade prices for C and C++ compilers available under MS-DOS and Microsoft Windows between 1987 and 1998. Data kindly provided by Viard Viard_07. code
Figure 82. Interval between product preannouncement date and its promised availability date against delay between promised date and actual date product became available. Data from Bayus et al Bayus_01. code

Ecosystems

Figure 83. Total gigabytes of DRAM shipped world-wide in given year, along with shipments by device capacity (in bits). Data from Victor et al Victor_02. code
Figure 84. Yearly expenditure on punched cards and tabulating equipment by the UK government. Data from Agar Agar_03. code
Figure 85. Mean age of installed mainframe computers, 1968-1983. Data from Greenstein Greenstein_94. code
Figure 86. Computer installation market share of IBM and its top seven competitors (known at the time as the seven dwarfs; no data is available for 1969). Data from Brock Brock_75. code
Figure 87. Mobile phone operating system shipments, as percentage of total per year. Data from Reimer Reimer_12 (before 2007) and Gartner Gartner_17 (after 2006). code
Figure 88. Maximum speed achieved by vehicles over the surface of the Earth and in the air, over time. Data from Lienhard Lienhard_06. code
Figure 89. Phylogenetic tree of Debian derived distributions, based on 50,708 possible packages included in each distribution. Data from Keil et al Keil_16. code
Figure 90. Number of transistors, frequency and SPEC performance of cpus when first launched. Data from Danowitz et al Danowitz_12. code
Figure 91. Number of process model change requests made in three years of a banking Customer registration project. Data kindly supplied by Branco Branco_12. code
Figure 92. Size at foundation and lifetime of 53 secular and religious 19th century American utopian communities. Data from Dunbar et al Dunbar_18. code
Figure 93. Total instructions in the software shipped with various models of IBM computer, plus Datatron from Burroughs. Data extracted from Naur et al Naur_69. code
Figure 94. Size of 40 operating systems (Kbytes, measured in 1975) capable of controlling a given number of unique devices, plus quadratic regression model. Data from Elci Elci_75. code
Figure 95. Total value of custom and packaged software (hardware vendor+third-party) sales in the US. Data from Phister Phister_79. code
Figure 96. Sorted list of total amount awarded by bug bounties to individual researchers, based on two datasets downloaded from HackerOne. Data from Zhao et al Zhao_15 and Maillart et al Maillart_17. code
Figure 97. Estimated number of comments written in German, in the LibreOffice source code. Data from Meeks Meeks_17. code
Figure 98. Percentage of function definitions declared to have a given number of parameters in: embedded applications, the SPECint95 benchmark???, and the translated form of C source benchmark programs. Data for embedded and SPECint95 kindly supplied by Engblom Engblom_99a, C book data from Jones Jones_05a. code
Figure 99. Hours required to build a car radio after the production of a given number of radios, with break periods (shown in days above x-axis); lines are models fitted to each production period. Data extracted from Nembhard et al Nembhard_01. code
Figure 100. Man-hours required to build a particular kind of ship, at the Delta Shipbuilding yard, delivered on a given date (x-axis). Data from Thompson Thompson_07. code
Figure 101. Total computer systems purchased and rented by the US Federal Government in the respective fiscal years ending June 30. Data from US Government General Accounting Office Staats_71. code
Figure 102. Total U.S. revenue from sale of computer systems and data processing service industry revenue. Data from Phister Phister_79 table II.1.20 and II.1.26. code
Figure 103. Yearly development cost and lines of code delivered to the US Air Force between 1960 and 1986. Data extracted from NeSmith NeSmith_86. code
Figure 104. Total sales of various kinds of processors. Data from Hilbert et al Hilbert_11. code
Figure 105. Maximum memory capacity and average cost of computer systems in 1981. Data from Ein-Dor Ein-Dor_85. code
Figure 106. Monthly unit sales (in millions) of microprocessors having a given bus width. Data kindly supplied by Turley Turley_02. code
Figure 107. TSMC revenue from wafer production, as a percentage of total revenue, at various line widths. Data from TSMC TSMC_17. code
Figure 108. Number of new UK companies registered each month, whose SIC description includes the word software or computer (case not significant). Data extracted from OpenCorporates OpenCorporates_15. code
Figure 109. Number of companies manufacturing cars and PCs… PC company data from Stavins Stavins_95 and …, automobile manufacturer data from ??? <book ???>. code
Figure 110. Connections between companies in a Dutch software business network. Data kindly provided by Crooymans Crooymans_15. code
Figure 111. Reported worldwide software industry Mergers and Acquisitions (M&A). Data from Solganick Solganick_16. code
Figure 112. Loess fits to time taken to publish an RFC having Standard or non-Standard status, for IETF committees having a given percentage of commercial membership (people wearing suits). Data from Simcoe Simcoe_13. code
Figure 113. Percentage of employment by US industry sector 1850-2009. Data kindly provided by Kossik Kossik_11. code
Figure 114. Number of people working in the 12 computer occupation codes assigned by the U.S. census bureau during 2014, stratified by ages bands (the ‘Software developers, applications and system software’ code contains the largest percentage; see code for the identity of other occupation codes). Data from Beckhusen Beckhusen_16. code
Figure 115. Payer and payee countries of bug bounties (total value over ???). Data from hackerone Hackerone_17. code
Figure 116. Decade in which newly designed US Air Force aircraft first flew, with colors indicating current operational status. Data from Echbeth el at Eckbreth_11. code
Figure 117. Daily minutes spent using an App, from Apple’s AppStore, … Data extracted from Ansar <book Ansar_1?>. code
Figure 118. Ratio of development costs to total five-year maintenance costs for 158 IBM software systems sorted by size; curve is a beta distribution fitted to the data (in red). Data from Dunn Dunn_11. code
Figure 119. Number of software systems surviving to a given number of years and exponential equation fits. Data from Tamai Tamai_92. code
Figure 120. Age of systems, developed using one of two methodologies, and corresponding monthly maintenance time, along with loess fits. Data extracted from Dekleva Dekleva_92. code
Figure 121. Percentage of patches submitted to WebKit (34,535 in total) transitioning between various stages of code review. Data from Baysal et al Baysal_13. code
Figure 122. Number of forked projects per year, identified using Wikipedia during August 2011. Data from Robles et al Robles_12b.
Figure 123. Percentage of code ported from NetBSD to various versions of OpenBSD, broken down by version of NetBSD in which it first occurred (denoted by incrementally changing color). Data kindly provided by Ray Ray_13.
Figure 124. Survival curve of Linux distributions derived from five widely-used parent distributions (identified in legend). Data from Lundqvist et al Lundqvist_12. code
Figure 125. Survival curve for packages included in the standard Debian distribution. Data from Caneill et al Caneill_14. code
Figure 126. Number of pdf files created using a given version of the portable document format appearing on sites having a .uk web address between 1996 and 2010. Data from Jackson Jackson_12. code
Figure 127. Percentage share of total Android market at days since launch for various versions of Android. Data from Villard Villard_15. code
Figure 128. Words in Intel x86 architecture manuals and code-points in Unicode Standard over time. Data for Intel x86 manual kindly provided by Baumann Baumann_16. code
Figure 129. Number of gcc compiler flags and options over time, with fitted regression models. Data from Fursin et al Fursin_14. code
Figure 130. Number of new programming languages, per year, described in a published paper. Data from Pigott et al Pigott_15. code
Figure 131. Number of monthly developer job related tweets specifying a given language. Data kindly provided by Destefanis Destefanis_14. code
Figure 132. Number of projects making use of a given number of different languages in a sample of 100,000 GitHub project. Data kindly supplied by Bissyande Bissyande_13. code
Figure 133. Ranked order of number of Android/Ubuntu (1.1 million apps)/(71,199 packages) linking to each supported POSIX function. Data from Atlidakis et al Atlidakis_16. code
Figure 134. Survival curves for Debian package lifetime and for a package to contain its first dependency conflict. Data from Drobisz et al Drobisz_15. code
Figure 135. Dependencies between the Java packages in various versions of ANTLR. Data from Al-Mutawa Al-Mutawa_13. code
Figure 136. Fraction of source in 130 releases of Linux (x-axis) that originates in an earlier release (y-axis). Data extracted from png file kindly supplied by Matsushita Livieri_07. code
Figure 137. Number of functions (in Evolution; the point at zero are incorrect counts) modified a given number of times (upper) or modified by a given number of different people (lower); red line is a bi-exponential fit, green/blue lines are the individual exponentials. Data from Robles et al Robles_12a. code
Figure 138. Number of functions (in Evolution) modified a given number of times broken down by number of authors. Data from Robles et al Robles_12a. code
Figure 139. Density plot of time interval, in hours, between each modification of a function in Evolution. Data from Robles et al Robles_12a. code
Figure 140. Survival curves of clones in the Linux high/medium/low level SCSI subsystems. Data from Wang Wang_12. code
Figure 141. Number of identifiers renamed, each month, in the source of Eclipse-JDT; version released on given date shown. Data from Eshkevari et al Eshkevari_11. code
Figure 142. Changes in the number of tables in the Mediawiki and Ensembl project database schema over time. Data from Skoulis Skoulis_13. code
Figure 143. Survival curve for tables in Wikimedia and Ensembl database schema. Data from Skoulis Skoulis_13. code

Projects

Figure 144. Number of projects having a given duration (upper; 2,992 projects), delivered containing a given number of SLOC (middle; 1,859 projects), and using a given percentage of out-sourced effort (lower; 1,267 projects). Data extracted from Akita et al Akita_12. code
Figure 145. Firm bid price against schedule estimate, received from 14 companies, for the same tender specification. Data from Anda et al Anda_09. code
Figure 146. Distribution of effort (person hours) during the development of four engine control systems projects, plus non-project work and holidays, at Rolls-Royce. Data extracted from Powell Powell_01. code
Figure 147. Commits within a particular hour and day of week for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code
Figure 148. Estimate given by three groups of subjects after seeing a statement by a middle manager containing an estimate (2 months or 20 months) or no estimate (control). Data from Aranda Aranda_05. code
Figure 149. Estimated and actual project implementation effort. Data from Jørgensen Jorgensen_04b and Kitchenham et al Kitchenham_02. code
Figure 150. Two estimates (in work hours), made by seven subjects, for each of six tasks. Data from Grimstad et al Grimstad_07. code
Figure 151. Density plot of number of projects investing a given fraction of their total effort in a given project phase. Data kindly provided by Wang Wang_17. code
Figure 152. Mean and median effort (hours) for projects having elapsed time between four and 20 months (lines a fitted quadratics). Data from Wang et al Wang_17. code
Figure 153. Estimated project cost from 12 estimating models. Data from Mohanty Mohanty_81. code
Figure 154. Elapsed weeks (x-axis) against effort in man-hours per week (y-axis) for a project, plus three fitted curves. Data extracted from Basili et al Basili_81. code
Figure 155. Function points and corresponding normalised costs for 149 projects from one large institution. Data extracted from Kampstra el al <book Kampstra_0?>. code
Figure 156. Cost per requirement, function point and story point for two projects, over 13 monthly releases. Data from Huijgens Huijgens_13. code
Figure 157. Estimated effort to implement 24 story points and corresponding COSMIC function. Data from Commeyne et al Commeyne_16. code
Figure 158. Percentage profit/loss on 145 fixed-price software development contracts. Data extracted from Coombs Coombs_03. code
Figure 159. IBM’s profit margin on all System 360s sold in 1966, by system memory capacity in kilobytes; monthly rental cost during 1967 in parentheses. Data from DeLamarter DeLamarter_88. code
Figure 160. COSMIC function-points and compiled size (in kilobytes) of components in four different ECU modules; lines show fitted regression model. Data from Lind et al Lind_12. code
Figure 161. Mean LOC against standard deviation of LOC, for multiple implementations of seven distinct problems; grey line is fitted regression. Data from: Anda et al Anda_09, Jørgensen Jorgensen_16b, Lauterbach Lauterbach_87, McAllister et al McAllister_89, Selby et al Selby_85, Shimasaki et al Shimasaki_80, van der Meulen van_der_Meulen_07. code
Figure 162. Bids made by 19 estimators from the same company (divided into two groups for the experiment). Data from Jørgensen et al Jorgensen_04c. code
Figure 163. Initial implementation schedule, with employee number(s) given for each task (percentage given when not 100%) for a project. Data from Ge et al Ge_16. code
Figure 164. Estimated cost of developing a bespoke software system by the three companies contracted to do the work. Data from Yu Yu_03. code
Figure 165. Number of citations from Standard documents within protocol level to documents in the same and other levels (RTG routing, INT internet, TSV transport, RAI realtime applications and infrastructure, APP Applications, W3C recommendations). Data from Simcoe Simcoe_15. code
Figure 166. Effort, in person hours per month, used in the implementation of the five components making up the PAVE PAWS project (grey line shows total effort). Data extracted from Curtis et al Curtis_80. code
Figure 167. Percentage of actual project duration elapsed when 882 schedule estimates were made, during 121 projects, against estimated/actual time ratio (boundary maximum in red). Data kindly provided by Little Little_06. code
Figure 168. Initial estimated project duration against number of schedule estimates made before completion, for 121 projects; line is a loess fit. Data kindly provided by Little Little_06. code
Figure 169. Percentage change in 882 estimated delivery dates announced at a given percentage of the estimated elapsed time of the corresponding project, for 121 projects (red is a loess fit); blue line is a density plot of percentage estimated duration when estimate made. Data kindly provided by Little Little_06. code
Figure 170. Number of work packages completed within a given time; colored lines are work packages having the same estimated lead time. Data extracted from van Oorschot et al van_Oorschot_05. code
Figure 171. Phase during which work on a given activity of development was actually performed, average percentages over 13 projects. Data from Zelkowitz Zelkowitz_87. code
Figure 172. Percentage distribution of effort time (red) and schedule time (blue) across design/coding/testing for 38 NASA projects. Data from Condon et al Condon_93. code
Figure 173. Percentage distribution of effort across design/coding/testing for 10 ICL projects (red), 11 BT projects (green), 11 space projects (blue) and 12 defense projects (purple). Data from Kitchenham et al Kitchenham_85 and Graver et al Graver_77. code
Figure 174. Percentage of requirements added/deleted/modified in eight features (colored lines) of a product over 22 releases. Data extracted from Felici Felici_04. code
Figure 175. Pagerank of the stakeholders in the network created from the Open (red) and Closed (blue) stakeholder responses (values for each have been sorted). Data from Lim Lim_10. code
Figure 176. Average value assigned to requirements (red) and one standard deviation bounds (blue) based on omitting one stakeholder’s priority value list. Data from Regnell et al Regnell_01. code
Figure 177. Average number of days taken to implement a feature, over time; smoothed using a 25-day rolling mean. Data kindly supplied by 7Digital 7Digital_12. code
Figure 178. Number of features whose implementation took a given number of elapsed workdays; upper first 650-days, lower post 650-days. Fitted zero-truncated negative binomial distribution in green. Data kindly supplied by 7Digital 7Digital_12. code
Figure 179. Number of feature developments started on a given work day (red new features, green bugs fixes, blue ratio of two values; 25-day rolling mean). Data kindly supplied by 7Digital 7Digital_12. code
Figure 180. Survival curve of IT outsourcing suppliers continuing to work for 2,382 Credit Unions. Data kindly provided by Peukert Peukert_10. code
Figure 181. Growth, over 11 major releases in 28 years, of messages supported, command line options, kilo-words in product manual and thousand lines of code in PC-Lint. Data kindly provided by Gimpel Gimpel_14. code
Figure 182. Average number of staff required to support renewal of code having a given average lifetime. Data extracted from Elliott Elliott_77. code
Figure 183. Growth in the number of projects within the Apache ecosystem, along with the amount of contained code. Data from Bavota et al Bavota_13. code
Figure 184. Percentage overlap of developers contributing, during 2013, to both of each pair of 147 Apache projects. Data kindly provided by Panichella Bavota_15. code
Figure 185. Percentage of developers, employed by given companies, working on OpenStack at the time of a release (x-axis). Data from Teixeira et al Teixeira_15. code
Figure 186. Production rate of a team containing a given number of people, with communication overhead $t_0=t_1=0.1$ and various distributions of percentage communication time; black line is zero communications overhead. code

Reliability

Figure 187. Flow of updates between participants in one Android ecosystem; number of each kind of member given in brackets, number of updates shipped on edges. Data from Thomas Thomas_15. code
Figure 188. Reported faults against number of installations (upper) and age (lower). Data from the "wheezy" Debian release UDD_14. code
Figure 189. Duplicates of Eclipse fault report 4671 (report 6325 was finally chosen as the master report); arrows point to report marked as duplicate of an earlier report. Data from Sadat et al Sadat_17. code
Figure 190. Mean percentage likelihood of (translated) statements containing a probabilistic term; one colored line per country. Data from Budescu et al Budescu_14. code
Figure 191. Survival curve of the two most common warnings reported by Splint in Samba and Squid. Data from De Penta et al Di_penta_09. code
Figure 192. Survival rate of reported faults in Linux device drivers and other Linux subsystems… Data from Palix et al Palix_10b. code
Figure 193. Unit cost of a missile against the number of development test flights it had. Data extracted from Augustine Augustine_97. code
Figure 194. Number of reported incidents for each of 800 applications installed on over 120,000 desktop machines. Data from Lucente Lucente_15. code
Figure 195. Number of accesses to memory address blocks, per 100,000 instructions, executing gzip on two different inputs. Data from Brigham Young Brigham_Young via Feitelson. code
Figure 196. Transition counts of five distinct faults experienced in 50 runs of program A2; boxes labeled with the faults experienced up to that point. Data from Nagel et al Nagel_82. code
Figure 197. Number of input cases processed before a particular fault was experienced by program A2; the list is sorted for each distinct fault. Data from Nagel et al Nagel_82. code
Figure 198. Input cases processed by two implementations, during four replications, before a failure occurred; grey lines are a regression fit for each program. Data from Dunham et al Dunham_86. code
Figure 199. Number of input cases processed by program AT1 before a given fault was experienced, during 25 replications. Data from Dunham et al Dunham_86. code
Figure 200. Fault discovered against hours of testing, for four releases of a product. Data from Wood Wood_96. code
Figure 201. Time taken to discover a thread safety violation in 22 Java classes, violin plots for 10 runs involving each class. Data kindly supplied by Pradel Pradel_12. code
Figure 202. Percentage of reported problems having a given mean time to first problem occurrence (in months, summed over all installations of a product) for none products. Data from Adams Adams_84. code
Figure 203. Number of times the same fault was experienced in two programs, crashes traced to the same program location; with fitted biexponential equation. Data kindly provided by Zhao Zhao_16. code
Figure 204. Predicted growth, with 95% confidence intervals, in the number of new crash faults found in the 2003, 2007 and 2010 releases of Microsoft Office. Data from Kaminsky et al Kaminsky_11. code
Figure 205. Number of program crashes traced to the same executable location, in the 2003, 2007 and 2010 releases of Microsoft Office (blue/purple lines are the two parts of biexponential fits). Data from Kaminsky et al Kaminsky_11. code
Figure 206. Number of instances of the same reported fault in GCC, with fitted biexponential regression model. Data from Sun et al Sun_16. code
Figure 207. Number of instances of the same reported fault in KDE, with fitted triexponential regression model. Data from Sadat et al Sadat_17. code
Figure 208. Amount of source (millions of lines) in each version broken down by the version in which it first appears. Data extracted from Massacci et al Massacci_11. code
Figure 209. Market share of Firefox versions between official release and end-of-support. Data from Jones Jones_12. code
Figure 210. Number of people with Internet access per 100 head of population in the developed world and the whole world. Data from ITU ITU_12. code
Figure 211. Amount of end-user usage of code originally written for Firefox version 1.0 by various other versions. Data extracted from Massacci et al Massacci_11. code
Figure 212. Fraction of usability problems found by a given number of test subjects/evaluations in 12 system evaluations, lines show fitted regression model for each system. Data extracted from Nielsen et al Nielsen_93. code
Figure 213. Probability the rounded value given has actually been rounded, given an estimate of the likelihood of rounding and the number of values likely to have been rounded; grey line shows 50% probability of rounding. code
Figure 214. Number of change requests having a given recorded time to decide whether needed and to implement. Data from Basili et al Basili_84. code
Figure 215. Min/max range of values (red/blue lines) and best value estimate (green circles), given by subjects interpreting the value likely expressed by statements containing ‘less than 100’ and ‘more than 100’. Data kindly provided by Cummins Cummins_11. code
Figure 216. Total number of implementations in each of 36 equivalence classes, plus both first and last competitor submissions. Data from van der Meulen et al van_der_Meulen_04. code
Figure 217. Cumulative number of defects logged against the POSIX standard, by defect classification. Data kindly provided by Josey OpenGroup_17. code
Figure 218. Ranked occurrences of compiler messages generated by submitted student Java and Python programs. Data from Pritchard Pritchard_15. code
Figure 219. Fraction of mutated programs, in various languages, that successfully compiled/executed/produced same output. Data from Spinellis et al Spinellis_12. code
Figure 220. Number of reported faults whose fixes involved a given number of files, modules or lines in a sample of 290 faults in AspectJ; lines are power laws fitted using regression. Data from Lucia Lucia_14. code
Figure 221. Normalized number of commits, made to fix reported faults, involving a given number of files in five software systems; grey line is power law fitted using regression. Data from Zhong et al Zhong_15 via M. Monperrus. code
Figure 222. Percentage of insertions/modifications of a given number of lines resulting in a reported fault; lines are fitted regression models. Data from Purushothaman et al Purushothaman_05. code
Figure 223. Survival curve (with 95% confidence bounds) of time to fix vulnerabilities reported in npm packages (Base) and time to update a package dependency (Depend) to a corrected version (i.e., not containing the reported vulnerability); for vulnerabilities with severity high and medium. Data from Decan et al Decan_18. code
Figure 224. Number of bit-flips in SRAM fabricated using various processes… Data kindly provided by Autran Autran_12. code
Figure 225. For systems 2 and 18, number of uptime intervals, binned into 10 hour intervals, red line is fitted negative binomial distribution. Data from Los Alamos National Lab (LANL). code
Figure 226. Fault slip throughs for a development project at Ericsson; y-axis lists phase when fault could have been detected, x-axis phase when fault was found. Data from Hribar et al Hribar_08. code
Figure 227. Number of vulnerabilities found using black-box testing and manual code review of nine implementations of the same specification. Data from Finifter Finifter_13b. code
Figure 228. Number of faults experienced per unit of testing effort, over a given number of weeks. Data from Stikkel Stikkel_06. code
Figure 229. Statement coverage achieved by the respective program’s test suite (data on sixth program not usable). Data from Marinescu et al Marinescu_14. code
Figure 230. Statement coverage against branch coverage for 300 or so Java projects; colored lines are fitted regression models for three program sizes, equal value line in grey. Data from Gopinath et al Gopinath_14. code
Figure 231. Number of statements executed along error and non-error paths within a function. Data kindly provided by Kang Kang_16. code
Figure 232. Basic block coverage against branch coverage for a 35 KLOC program. Data from Gokhale et al Gokhale_06. code
Figure 233. Fraction of basic blocks executed by a given number of tests, for 20 implementations using three test suites. . Data from McAllister et al McAllister_89. code
Figure 234. Statement coverage against mutants killed for 300 or so Java projects; colored lines are fitted regression models for three program sizes, equal value line in grey. Data from Gopinath et al Gopinath_14. code

Source code

Figure 235. Number of source files, methods and lines of code, within methods, contained in each of 13,103 Java projects; lines are kernel density plots. Data kindly provided by Landman Landman_16. code
Figure 236. Number of methods/functions containing a given number of source lines; 17.6M methods, 6.3M functions. Data kindly provided by Landman Landman_16. code
Figure 237. Number of files and lines of code in 3,782 projects hosted on Sourceforge. Data from Herraiz Herraiz_08. code
Figure 238. Percentage of call instructions contained in code generated from the same C source, against call execution percentage for various processors; grey line is fitted regression model. Data from Davidson et al Davidson_89b. code
Figure 239. Time to compile, using -O3 optimization, each of 71,200 function (in the SPEC benchmark) containing a given number of LLVM instructions; line shows fitted regression model for one trend in the data. Data kindly provided by Auler Auler_13. code
Figure 240. Number of feature constants against LOC for 40 C programs, with fitted regression line. Data from Liebig et al Liebig_10. code
Figure 241. Two sentences, with their dependency representations; upper sentence has total dependency length six, while in the lower sentence it is seven. Based on Futrell et al Futrell_15. code
Figure 242. One sentence containing four and the other eight propositions, along with their propositional analyses. Based on Kintsch et al Kintsch_73. code
Figure 243. Mean reading time (in seconds) for sentences containing a given number of propositions and as a function of the number of propositions recalled by subjects; with fitted regression models. Data extracted from Kintsch et al Kintsch_73. code
Figure 244. Subject confidence level, on a one to five scale (yes positive, no negative), of having previously seen a sentence containing a given number of idea units. Data extracted from Bransford et al Bransford_71. code
Figure 245. Percentage of false-positive recognition errors for biographies having varying degrees of thematic relatedness to the famous person, in before, after, famous, and fictitious groups. Data extracted from Dooling et al Dooling_77. code
Figure 246. Percentage of correct responses, for subjects having a given reading span, to the pronoun reference questions as a function of the number of sentences (x-axis) between the pronoun and the referent noun. Data extracted from Daneman et al Daneman_80. code
Figure 247. Hermann grid, with variation due to Ninio and Stevens Ninio_00 to create an extinction illusion. code
Figure 248. Time taken for subjects to read a page of text, printed with a particular orientation, as they read more pages (initial experiment and repeated after one year); with fitted regression lines. Results are for the same six subjects in two tests more than a year apart. Based on Kolers Kolers_76. code
Figure 249. Number of C function definitions containing a given number of identifier uses (unique and all). Data from Jones Jones_05a. code
Figure 250. Three versions of the source of the same program, showing identifiers, non-identifiers and in an anonymous form; illustrating how a reader’s existing knowledge of words can provide a significant benefit in comprehending source code. Based on an example from Laitinen Laitinen_95.
Figure 251. Probability (averaged over all cue words) that, for a given cue word, a given percentage of subjects will produce the same word. Data from Nelson et al Nelson_98. code
Figure 252. Mean response time for each of 17 segments; regression line fitted to segments 2-15. Data extracted from Lewicki et al Lewicki_88. code
Figure 253. Lines of code, Halstead’s volume and cyclomatic complexity of Linux version 2.6.9. Data from Israel et al Israeli_10. code
Figure 254. Number of selection-statements having a given maximum nesting level; for embedded C Engblom_99a (whose data was multiplied by a constant to allow comparison; the data for nesting depth 5 was interpolated from the adjacent points) and based on the visible form of the .c files. Data from Jones Jones_05a. code
Figure 255. Number of files, in Eclipse projects, that have been modified by a given number of people, with fitted regression model. Data from Taylor Taylor_12. code
Figure 256. Number of Python source files containing a given number of SLOC; all files, and with duplicates removed. Data from Lopes et al Lopes_17. code
Figure 257. Number of commits of a given length, in lines added/deleted to fix various faults in Linux file systems. Data from Lu et al Lu_13. code
Figure 258. Mean compatibility of 50 applications to 11 versions of Python, over time. Data from Malloy et al Malloy_17. code
Figure 259. Cumulative number of developers who have committed Java source making use of new features added to the language. Data from Dyer et al Dyer_14. code
Figure 260. Number of reads and writes to the same variable, for 3,315 variables, made during the execution of the Mediabench suite. Data kindly provided by Caspi Caspi_00. code
Figure 261. to be decided… Data kindly provided by Suresh Suresh_15. code
Figure 262. Heat map of the fraction of a file’s basic blocks executed when performing a given feature of the SHARPE program. Data from Wong et al Wong_00. code
Figure 263. Number of optional features selected by a given number of flags. Data kindly provided by Berger Berger_12. code
Figure 264. Cumulative percentage of configuration options impacting a given number of source files in the Linux kernel. Data kindly provided by Ziegler Ziegler_16. code
Figure 265. Total if-statements against if-statements whose condition involves a null check, in each of 800 Java projects. Data kindly provided by Osman Osman_16. code
Figure 266. The number of dynamic statements, LOC and methods against total number of those constructs appearing in 28 Ruby programs; lines are power-law regression fits. Data from Rodrigues et al Rodrigues_18. code
Figure 267. Yearly occurrence of numbers words (e.g., "one", "twenty-two"), averaged over each year since 1960, in Google’s book data for three languages. Data kindly provided by Piantadosi Piantadosi_14. code
Figure 268. Percentage occurrence of the most significant digit of floating-point, integer and hexadecimal literals in C source code. Data from Jones Jones_05a. code
Figure 269. Number of distinct API methods called in 1,435 Java projects containing a given number of method calls, with regression fit. Data from Lämmel et al Lammel_11. code
Figure 270. Number of function calls, against corresponding number of calls containing callbacks and anonymous callbacks, in 130 Javascript programs; lines are fitted regression models. Data from Gallaba et al Gallaba_15. code
Figure 271. Sequences of methods, from java.lang.StringBuilder, called on the same object; based on 3,418 Jar files. Data from Mendez et al Mendez_13. code
Figure 272. For each Java class, the percentage of method sequences containing a given number of calls, in 3,418 Jar files. Data from Mendez et al Mendez_13. code
Figure 273. Number of data and operation extensions to 1,560 Smalltalk class hierarchies that contain both kinds of extension; line is a fitted regression model. Data from Robbes et al Robbes_15. code
Figure 274. "Worth estimate" for the kind of method activity attribute (see [_ordering_of_items]). Data from Biegel et al Biegel_12. code
Figure 275. Number of C functions having a given number of unused parameters; straight lines are fitted regression models. Data from Jones Jones_05a. code

Stories told by data

Figure 276. Number of virus infections and UFO sighting, reported in 3,072 U.S. counties during 2010. Data from Jacobs et al Jacobs_14. code
Figure 277. Data having values following various visual patterns when plotted. code
Figure 278. Years of professional experience in a given language for experimental subjects. Data from Prechelt Prechelt_07. code
Figure 279. Total number of lines of C source, in .c and .h files, having a given length, i.e., containing a given number of characters (upper) and tokens (lower). Data from Jones Jones_05a. code
Figure 280. Various measurements of work performed implementing the same functionality, number of lines of Haskell and C implementing functionality, CFP (COSMIC function points; based on user manual) and length of formal specification. Data kindly provided by Staples Staples_13. code
Figure 281. Effort, in hours (log scale), spent in various development phases of projects written in Ada (blue) and Fortran (red). Data from Waligora et al Waligora_95. code
Figure 282. Performance of experts (e) and novices (n) in a test driven development experiment. Data from Muller et al Muller_07. code
Figure 283. Correlations between pairs of attributes of 12,799 Github pull requests to the Homebrew repo, represented using pie charts and shaded boxes. Data from Gousios et al Gousios_14. code
Figure 284. Hierarchical cluster of correlation between pairs of attributes of 12,799 Github pull requests to the Homebrew repo. Data from Gousios et al Gousios_14. code
Figure 285. Number of computers having a given SPECint result; line is a loess fit. Data from SPEC SPEC_14. code
Figure 286. Effort invested in project definition (as percentage of original estimate) against cost overrun (as percentage of original estimate). Data extracted from Gruhl Gruhl_9x. code
Figure 287. Relative clock frequency of cpus when first launched (1970 == 1). Data from Danowitz et al Danowitz_12. code
Figure 288. Year and age at which survey respondents started contributing to FLOSS, i.e., made their first FLOSS contribution. Data from Robles et al Robles_14. code
Figure 289. Number of computers with a given SPECint result, summed within 13 equal width bins (upper) and kernel density plot (lower). Data from SPEC SPEC_14. code
Figure 290. Number of commits containing a given number of lines of code made when making various categories of changes to the Linux filesystem code (upper) and a density plot of the same data (lower). Data from Lu et al Lu_13. code
Figure 291. Histogram of the log of some measured quantity. code
Figure 292. Developer estimated effort against actual effort (in hours), for various maintenance tasks, e.g., adaptive, corrective and perfective; upper as-is, middle jittered values and lower size proportional to the log of the number measurements. Data from Hatton Hatton_07. code
Figure 293. Number of installations of Debian packages against the age of the package; middle plot was created by smoothScatter and lower plot by contour. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code
Figure 294. Number of lines added to glibc each week. Data from González-Barahona et al Gonzalez-Barahona_14. code
Figure 295. Boxplot of time between a potential mistake in Eclipse being reported and the first response to the report; right plot is notched. Data from Breu et al Breu_10. code
Figure 296. Violin plot of time between bug being reported in Eclipse and first response to the report. Data from Breu et al Breu_10. code
Figure 297. Time taken for developers to debug various programs using batch processing or online (i.e., time-sharing) systems. Data kindly provided by Prechelt Prechelt_99a. code
Figure 298. Pairs of languages used together in the same GitHub project with connecting line width, color and transparency related to number of occurrences. Data kindly supplied by Bissyande Bissyande_13. code
Figure 299. References from one document to another in the Microsoft Server Protocol specifications. Data extracted by the author from the 2009 document release WSPP_15. code
Figure 300. Alluvial plot of relative prioritization order of selection and application of Github pull requests. Data from Gousios et al Gousios_15a. code
Figure 301. Intel Sandy Bridge L3 cache bandwidth in GB/s at various clock frequencies and using combinations of cores (0-3 denotes cores zero-through-three, 0,2,4 denotes the three cores: zero, two and four). Data from Schone et al Schone_12. code
Figure 302. Contour plot of the number of sessions executed on a computer having a given processor speed and memory capacity. Data kindly provided by Thereska Thereska_10. code
Figure 303. Root source of 1,257 faults and where fixes were applied for 21 large safety critical applications. Data from Hamill et al Hamill_14. code
Figure 304. Ternary plots drawn with two possible visual aids for estimating the position of a point (red plus at x=0.1, y=0.35, z=0.55); axis names appear on the vertex opposite the axis they denote. code
Figure 305. Earth relative positions of NASA’s Orbview-2 spacecraft when it experienced a single event upset (in blue) on 12 July 2000. Data kindly provided by LaBel Poivey_03. code
Figure 306. Estimated market share of Android devices by brand and product, based on downloads from 682,000 unique devices in 2015. Data from OpenSignal OpenSignal_15. code
Figure 307. Variables having a given number of read accesses, given 25, 50, 75 and 100 total accesses, calculated from running the weighted preferential attachment algorithm (red), the smoothed data (blue) and a fitted exponential (green). code
Figure 308. Throughput when running the SPEC SDM91 benchmark on a Sun SPARCcenter 2000 containing 8 CPUs, with the predictions from three fitted queuing models. Data from Gunther Gunther_05. code
Figure 309. Illustration of the difference in cognitive effort needed to locate points differing by shape or color. code
Figure 310. The three, seven and twelve color palettes returned by calls to the diverge_hcl, sequential_hcl, rainbow_hcl and rainbow functions. code
Figure 311. Percentage share of the Android market by successive Android releases, by individual version (top) and by date (lower); pastell colors on left and bold on right. Data from Villard Villard_15. code
Figure 312. Input case on which a failure occurred, for a total of 500,000 inputs; values plotted using a linear (upper) and logarithmic (lower) x-axis. Data from Dunham et al Dunham_86. code
Figure 313. Illustration of U-shape created when y-axis values are a ratio calculated from x-axis values. code
Figure 314. Mean time to fail for systems of various sizes (measured in lines of code); linear y-axis left, log y-axis right. Data extracted from Figure 8.3 of Putnam et al Putnam_92. code
Figure 315. Alternative representation of numeric values in Table. Data from Scott Scott_16. code
Figure 316. What’s up doc? Perhaps, not the expected pattern in the data. Equations from White White_12. code

Probability

Figure 317. Probability that three (red) or four (blue) consecutive false positive warnings occur in some total number of warnings (false positive rate appears on line). code
Figure 318. The relationship between words for tracts of trees in various languages. The interpretation given to words (boundary indicated by the zigzags) in one language may overlap that given in other languages. Adapted from DiMarco et al DiMarco_93.
Figure 319. Relationships between commonly used discrete and continuous probability distributions.
Figure 320. Shapes of commonly encountered discrete probability distributions (upper to lower: Uniform, Geometric, Binomial and Poisson). code
Figure 321. Cumulative density plots of the discrete probability distributions in Figure. code
Figure 322. Commonly encountered continuous probability distributions (upper to lower: Uniform, Exponential, Normal, beta). code
Figure 323. Samples of randomly selected values drawn from the same normal distribution (left: 100 points in each sample, right 1,000 points in each sample). code
Figure 324. Reading rate for text printed using a serif (blue) and sans-serif (red) font, data has been normalised and displayed as a density. Data from Veytsman et al Veytsman_12. code
Figure 325. Probability, with p-value < 0.05, that shapiro.test correctly reports that samples drawn from various distributions are not drawn from a Normal distribution, and probability of an incorrect report when the sample is drawn from a Normal distribution; 1,000 replications for each sample size. code
Figure 326. Number of conditionally compiled code sequences dependent on a given number of feature macros (red overwritten by blue: Linux, blue: FreeBSD). Data from Berger et al Berger_10. code
Figure 327. Percentage occurrence of statements for each of 100 or so C, C++ and Java programs, plotted as a density on the y-axis. Data from Zhu et al Zhu_15. code
Figure 328. A Cullen and Frey graph for the $3n+1$ program length data. Data kindly provided by van der Meulen van_der_Meulen_07. code
Figure 329. Number of 3n+1 programs containing a given number of lines and four distributions fitted to this data. Data kindly provided by van der Meulen van_der_Meulen_07. code
Figure 330. A zero-truncated Negative Binomial distribution fitted to the number of features whose implementation took a given number of elapsed workdays; first 650 days used. Data kindly provided by 7digital 7Digital_12. code
Figure 331. Density plot of MPI micro-benchmark runtime performance for calls to MPI_Allreduce with 1,000 Bytes (left curve) and to MPI_Scan with 10,000 Bytes (right curve). Data kindly supplied by Hunold Hunold_14. code
Figure 332. Mixture model fitted by the normalmixEM function to the performance data from calls to MPI_Allreduce. Data kindly supplied by Hunold Hunold_14. code
Figure 333. Density plot of accesses to one article on Slashdot, in minutes since its publication. The distinct Normal distributions (colored and fitted to the log of the data) contained in the mixture models fitted by the REBMIX (upper) and normalmixEM (lower) functions. Data kindly supplied by Kaltenbrunner Kaltenbrunner_07. code
Figure 334. Cumulative probability distribution of files size (red) and of number of bytes occupied in a file system (blue). Data from Irlam Irlam_93. code
Figure 335. Graph of available state transitions for Alaris volumetric infusion pump (the button presses that cause transitions between states are not shown). Data kindly supplied by Oladimeji Oladimeji_08. code
Figure 336. Discrete-time Markov chain for created/modified/deleted status of Linux kernel files at each major release from versions 2.6.0 to 2.6.39. Data from Tarasov Tarasov_12. code
Figure 337. Directed graph of emails between FreeBSD and OpenBSD developers, plus a few people involved in both discussions, with developers who sent/received less than four emails removed. Data from Canfora et al Canfora_11. code
Figure 338. Expected probability of a single instance (y-axis) against the probability of a measured struct type having grouped member types (x-axis); when both probabilities are the same points will be along the blue line. Data from Jones Jones_09b. code

Statistics for software engineering

Figure 339. Example of a sample drawn from a population. code
Figure 340. Date of introduction of a cpu against its commercial lifetime. Data from Culver Culver_10. code
Figure 341. A population of items having one of three colors and three strata sampled from it. code
Figure 342. Power consumed by three SERT benchmark programs at various levels of system load; crosses at 2% load intervals, lines based on 10% load intervals. Data kindly provided by Kistowski Kistowski_15. code
Figure 343. Distribution of 4,000 sample means for two sample sizes drawn from exponential (left), lognormal (center) and Pareto (right) distributions, vertical lines are 95% confidence bounds. The blue curve is the Normal distribution predicted by theory. code
Figure 344. Mean (red) and standard deviation (grey lines; they are not symmetrical because of the log scaling) of samples of 3 items drawn from a population of 1,000 items (blue line mean, green line standard deviation). Data kindly provided by Chen Chen_12. code
Figure 345. Density plot of mean of samples containing 3 or 12 items randomly selected from a data set of 1,000 items; process repeated 1,000 times for each sample size. Data kindly provided by Chen Chen_12. code
Figure 346. Number of commits to glibc for each day of the week, for the years from 1991 to 2012. Data from González-Barahona et al Gonzalez-Barahona_14. code
Figure 347. A Normal distribution with mean=4 and variance=8 and a Chi-squared distribution with four degrees of freedom having the same mean and variance (the vertical lines are at the distributions' median value). code
Figure 348. Density plot of execution time of 1,000 input data sets, with lines marking the mean, median and mode. Data kindly supplied by Chen Chen_12. code
Figure 349. Impact of serial correlation, AR(1) in this example, on the calculated mean (upper) and standard deviation (lower) of a sample (the legends specify the amount of serial correlation). code
Figure 350. Occurrence of sample median and mean values for 1,000 samples drawn from a binomial distribution. code
Figure 351. A contaminated normal, values drawn from two normal distributions with 10% of values drawn from a distribution having a standard deviation five times greater than the other. code
Figure 352. Regression model (red line; pvalue=0.02) fitted to the number of correct/false security code review reports made by 30 professionals; blue lines are 95% confidence intervals. Data from Edmundson et al Edmundson_13. code
Figure 353. Bootstrapped regression lines fitted to random samples of the number of correct/false security code review reports made by 30 professionals. Data from Edmundson et al Edmundson_13. code
Figure 354. Kernel density plot, with 95% confidence interval, of the number of computers having the same SPECint result. Data from SPEC SPEC_14. code
Figure 355. The four related quantities in the design of experiments. code
Figure 356. Examples of the impact of population prevalence, statistical power and p-value on number of false positives and false negatives. code
Figure 357. Visualization of Cohen’s $d$ for two normal distributions having different means and the same standard deviation (two left) and both different (right). code
Figure 358. The impact of differences in mean and standard deviation on the overlap between two populations ($\alpha$: probability of making a false positive error, and $\beta$: probability of making a false negative error). code
Figure 359. The power of a statistical test at detecting that a difference exists between the mean values of samples drawn from two populations, both having a Normal distribution; actual mean difference adjacent to colored line. code
Figure 360. Power analysis (50 and 10 runs at various p-values) of detecting a difference between two runs having a binomial distribution (runs needed to achieve power=0.8 at various p-values). code

Regression modeling

Figure 361. Relationship between data characteristics (edge labels) and applicable techniques (node labels) for building regression models.
Figure 362. Total lines of source code in FreeBSD by days elapsed since the project started (in 1993). Data from Herraiz Herraiz_08. code
Figure 363. Estimated cost and duration of 73 large Dutch federal IT projects, along with fitted model and 95% confidence intervals. Data from Kampstra et al Kampstra_09. code
Figure 364. Number of updates and fixes in each Linux release between version 2.6.11 and 3.2. Data from Corbet et al Corbet_12. code
Figure 365. The number of commits made and the number of contributing developers for Linux versions 2.6.0 to 3.12. The green line in the right plot is the regression model fitted by switching the x/y values. Data from Kroah-Hartman Kroah-Hartman_14. code
Figure 366. Effort/Size of various projects and regression lines fitted using Effort as the response variable (red, with green 95% confidence intervals) and Size as the response variable (blue). Data from Jørgensen et al Jorgensen_03. code
Figure 367. Lines of code in every initial release (i.e., excluding bug-fix versions of a release) of the Linux kernel since version 1.0, along with fitted straight line (upper) and quadratic (lower) regression models. Data from Israeli et al Israeli_10. code
Figure 368. Actual (left of vertical line) and predicted (right of vertical line) total lines of code in Linux at a given number of days since the release of version 1.0, derived from a regression model built from fitting a cubic polynomial to the data (dashed lines are 95% confidence bounds). Data from Israeli et al Israeli_10. code
Figure 369. Number of classes in the Groovy compiler at each release, in days since version 1.0. Data From Vasa Vasa_10. code
Figure 370. For each distinct language, the number of lines committed on Github and the number of questions tagged with that language. Data from Kunst Kunst_13. code
Figure 371. Percentage of vulnerabilities detected by developers working a given number of years in security. Data extracted from Edmundson et al Edmundson_13. code
Figure 372. Hours to develop software for 29 embedded consumer products and the amount of code they contain, with fitted regression model and loess fit (yellow). Data from Fenton el al Fenton_08. code
Figure 373. Points remaining after removal of overly influential observations, repeatedly applying Cook’s distance and Studentized residuals. Data from Fenton el al Fenton_08. code
Figure 374. Points remaining after removal of overly influential observations, also taking into account the Bonferroni p-value of the Studentized residuals; the line shows the fitted model and 95% confidence interval (loess fit in yellow). Data from Fenton el al Fenton_08. code
Figure 375. influenceIndexPlot for the model having the fitted line shown in Figure. Data from Fenton el al Fenton_08. code
Figure 376. Number of medical devices reported recalled by the US Food and Drug Administration, in two week bins. Upper: fitted straight line and confidence bounds, with loess fit (green); Lower: straight line (purple) fitted after two outliers replaced by mean and original fit (red). Data from Alemzadeh et al Alemzadeh_13. code
Figure 377. influenceIndexPlot of data from Alemzadeh et al Alemzadeh_13. code
Figure 378. Two fitted straight lines and confidence intervals, one up to the end of 2010 and one after 2010. Data from Alemzadeh et al Alemzadeh_13. code
Figure 379. Results from various studies of software requirements function points counted using COSMIC and FPA; lines are loess fits to studies based on industry and academic counters. Data from Amiri et al Amiri_11. code
Figure 380. Five different equations fitted to the Embedded subset of the COCOMO 81 data before influential observation removal (upper) and after influential observation removal (lower). Data from Boehm Boehm_81. code
Figure 381. Anscombe data sets with Pearson correlation coefficient, mean, standard deviation, and line fitted using linear regression. Data from Anscombe Anscombe_73. code
Figure 382. Residual of the straight line fit to the Linux growth data analysed in Figure (upper) and data+straight line fit (red) and loess fit (blue). Data from Israeli et al Israeli_10. code
Figure 383. Array element assignment benchmark compiled with gcc using the O0 (upper) and O3 (lower) options (measurements were grouped into runs of 2,000 executions). Data from Flater et al Flater_13. code
Figure 384. Number of installations of Debian packages against the age of the package, plus fitted model and loess fit. Data from the "wheezy" version of the Ultimate Debian Database project UDD_14. code
Figure 385. Quadratic relationship with various amounts of added noise fitted using a quadratic and exponential model. code
Figure 386. Author workload against number of activity types per author (upper) and ratio test (lower). Data from Vasilescu et al Vasilescu_12. code
Figure 387. Change-points detected by cpt.mean, upper using method="AMOC" and lower using method="PELT". Data from Alemzadeh et al Alemzadeh_13. code
Figure 388. Number of flags (y-axis jittered) used to control the selection of optional features in system containing a total number of features, loess curve (red), regression line (green). Data from Berger et al Berger_12. code
Figure 389. Monthly unit sales (in thousands) of 4-bit microprocessors. Data kindly supplied by Turley Turley_02. code
Figure 390. Fitted regression line to points (in red) and 3-D illustration of assumed Normal distribution of errors. code
Figure 391. Number of vulnerabilities detected by professional developers with web security review experience; upper: technically correct plot of model fitted using a Poisson distribution, lower: easier to interpret curve representation of fitted regression models assume error has a Poisson distribution (continuous lines) or a Normal distribution (dashed lines). Data extracted from Edmundson Edmundson_13. code
Figure 392. Number of functions containing a given number of break statements and a fitted Negative Binomial distribution. Data from Jones Jones_05a. code
Figure 393. Code review meeting duration for a given number of non-comment lines of code; fitted regression model, assuming errors have a Gamma distribution (red, with confidence interval in blue) or a Normal distribution (green). Data from Porter et al Porter_98. code
Figure 394. Number of APIs used in Java programs containing a given number of lines and three fitted models. Data from Starek Starek_10. code
Figure 395. Yearly development cost and line of Fortran code delivered to the US Air Force between 1962 and 1984; with fitted regression models. Data extracted from NeSmith NeSmith_86. code
Figure 396. Maintenance task effort and lines of code added+updated, with fitted regression model (red) and SIMEX adjusted for 10% error (blue). Data from Jørgensen Jorgensen_95. code
Figure 397. Regression modeling 0/1 data with a straight line and a logistic equation. code
Figure 398. ROC curve for the data listed in Table. code
Figure 399. Percentage of mutants killed at various percentage of path coverage for 300 or so Java projects; fitted Beta (red) and glm (blue) regression models. Data from Gopinath et al Gopinath_14. code
Figure 400. SPECint 2006 performance results for processors running at various clock rates, memory chip frequencies and processor family. Data from SPEC SPEC_14. code
Figure 401. Component+residual plots for three explanatory variables in a fitted SPECint model. code
Figure 402. Individual contribution of each explanatory variable to the response variable in a quadratic model of SPECint performance. code
Figure 403. Estimated and actual effort broken down by communication frequency, along with individually fitted straight lines. Data from Moløkken-Østvold et al Molokken_Ostvold_07. code
Figure 404. Illustration of the shared and non-shared contributions made by two explanatory variables to the response variable Y. code
Figure 405. pairs plot of lines added/modified/removed, growth and number of files and total lines in versions 2.6.0 through 3.9 of the Linux kernel. Data from Kroah-Hartman Kroah-Hartman_14. code
Figure 406. Example plots of functions listed in Table. These equations can be inverted, so they start high and go down. code
Figure 407. Time to execute a computational biology program on systems containing processors with various L2 cache sizes. Data kindly provided by Hazelhurst Hazelhurst_10. code
Figure 408. A logistic equation fitted to the lines of code in every non-bugfix release of the Linux kernel since version 1.0. Data from Israel et al Israeli_10. code
Figure 409. Predictions by logistic equations fitted to Linux SLOC data, using subsets of data up to 2900, 3650, 4200 number of days and all days since the release of version 1.0. Data from Israel et al Israeli_10. code
Figure 410. Increase in areal density of hard disks entering production over time. Data from Grochowski et al Grochowski_12. code
Figure 411. Lines of code in the GNU C library against days since 1 January 1990. Data from González-Barahona Gonzalez-Barahona_14. code
Figure 412. Number of failing programs caused by unique faults in gcc (upper) and SpiderMonkey (lower). Fitted model in green, with two exponential components in red and blue. Data kindly provided by Chen Chen_13. code
Figure 413. Power law (red) and exponential (blue) fits to feature macro usage in 20 systems written in C; fail to reject p-value for 20 systems is 0.64. Data from Queiroz et al Queiroz_17. code
Figure 414. Power consumption of six different Intel Core i5-540M processors running at various frequencies; colored lines denote fitted regression models for each processor. Data from Balaji et al Balaji_12. code
Figure 415. Example showing the three ways of structuring a mixed-effects model, i.e., different intersections/same slope (upper), same intersection/different slopes (middle) and different intersections/slopes (lower). code
Figure 416. Confidence intervals, 95%, for within-subject intercept and slope (right plots) of mixed-effects models in the adjacent code. code
Figure 417. The three components of the hourly rate of commits, during a week, to the Linux kernel source tree; components extracted from the time series by stl. Data from Eyolfson et al Eyolfson_11. code
Figure 418. Autocorrelation of number of defects found on a given day, for development project C. Data kindly provided by Buettner Buettner_08. code
Figure 419. Autocorrelation of two AR models (upper plots) and two MA models (lower plots). code
Figure 420. Partial autocorrelation of same two AR models (upper plots) and two MA models (lower plots) shown in Figure. code
Figure 421. Autocorrelation of indentation of source code written in various languages. Data from Hindle et al Hindle_08. code
Figure 422. Number of features started for each day and fitted regression trend line (left) and number of features after subtracting the trend (right), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code
Figure 423. Autocorrelation (left) and partial autocorrelation (right) of the number of features started on a given day (after differencing the log transformed data), over the entire period of the 7digital data. Data kindly supplied by 7Digital 7Digital_12. code
Figure 424. Predicted daily difference in the number of new feature starts (red) and 95% confidence intervals (blue). Data kindly supplied by 7Digital 7Digital_12. code
Figure 425. Time series whose values are uncorrelated (upper), but whose squared values are correlated (lower); see code for generation process. code
Figure 426. Cross correlation of feature release ‘size’ (upper non-bugfix releases, lower all releases) and date when bugs are prioritised. Data kindly supplied by 7Digital 7Digital_12. code
Figure 427. Estimated staff working on a project during every week. Data from Buettner Buettner_08. code
Figure 428. Market share of Firefox version 3.0 fitted using loess regression with various values of the span option. Data from W3Counter W3Counter_14. code
Figure 429. Cross-correlation of source lines added/deleted per week to the glibc library. Data from González-Barahona Gonzalez-Barahona_14. code
Figure 430. Visualization of alignment between weekly time series of lines code in NetBSD (blue) and FreeBSD (red). Data from Herraiz Herraiz_08 code
Figure 431. Effort distribution (person hours) over the eight main tasks of a development project at Rolls-Royce and a hierarchical clustering of each task effort time series based on pair-wise correlation and Euclidean distance metrics. Data extracted from Powell Powell_01. code
Figure 432. Two commonly used hazard functions; Weibull is monotonic (always increases, decreases or remains the same) and Lognormal which can increase and then decrease. code
Figure 433. Observation period with events inside and outside the study period. code
Figure 434. The Kaplan-Meier curve for survivability of new releases: (blue) ETPs using only official APIs, (blue) ETPs calling internal APIs (red); dotted lines are 95% confidence intervals. Data from Businge Businge_13. code
Figure 435. The Kaplan-Meier curve for survivability of ETPs ability to be built using SDK released in subsequent years: (blue) ETPs using only official APIs, (red) ETPs calling internal APIs; dotted lines are 95% confidence intervals, with plus signs, +, indicating censored data. Data from Businge Businge_13. code
Figure 436. Kaplan-Meier curves for time-to-fix…. Data from Arora et al Arora_10. code
Figure 437. Survival curve after adjustment for explanatory variables… code
Figure 438. Cumulative incidence curves for problems reported by the splint tool in Samba and Squid (time is measured in number of snapshot releases). Data from Di Penta et al Di_penta_09. code
Figure 439. Rose diagram of number of commits in each 3 hour period of a day for Linux and FreeBSD. Data from Eyolfson et al Eyolfson_11. code
Figure 440. The Cartwright (red; dcarthwrite), wrapped Cauchy (green; dwrappedcauchy) and wrapped von Mises (blue; dvonmises) circular probability distributions for various values of their parameters. code
Figure 441. Asymmetric extended wrapped forms of the Cardioid (upper), von Mises (middle) and Cauchy (lower) probability distributions for various values of their parameters. code
Figure 442. Number of commits (upper) and number of commits in which a fault was detected (lower) by hour of day of the commit, for Linux. Data from Eyolfson et al Eyolfson_14. code
Figure 443. Number of commits per hour for weekdays and fitted model (upper) and number of commits in which a fault was detected (lower), for Linux. Data from Eyolfson et al Eyolfson_14. code
Figure 444. Number of commits per hour for each weekday, fitted using $\cos(...\cos...)$ (upper) and $\cos(...\cos+\sin...)$ (lower), for Linux; in both cases the fitted fault model (red) has been rescaled to allow comparison. Data from Eyolfson et al Eyolfson_14. code
Figure 445. Application source lines against percentage of covered lines achieved by both Human & Dynodroid tests, only by Dynodroid tests and only by Human tests. Data from Machiry et al Machiry_13. code
Figure 446. Percentage of source lines covered by both Human & Dynodroid tests, by only by Dynodroid tests and only by Human tests; fitted regression line and prediction points for various total source lines, red plus. Data from Machiry et al Machiry_13. code

Other techniques

Figure 447. Volume of unit sphere in 1 to 50 dimensions, e.g., sphere has volume $\frac43pi$ in three dimensions. code
Figure 448. Top levels of the decision tree built from the reopened fault data. Data from Shihab et al Shihab_10a. code
Figure 449. A Bertin plot for items included in the same data structure as ‘Antibiotics used’, for each subject, after reordering by seriate. Data from Jones Jones_09b. code
Figure 450. A visualization of the Robinson matrix based on number of times pairs of items co-occur in the same data structure (the closer to the diagonal the more often they occur together). Data from Jones Jones_09b. code
Figure 451. Aggregated ranking of snippets by subjects in years 1 and 2 (red and black) and years 2 and 4 (black and blue); snippets have been sorted by year 2 ranking. Data from Buse et al Buse_10. code
Figure 452. "Worth estimate" of identifier access control (visibility), for the four kinds of definitions that can appear within a Java class. Data from Biegel et al Biegel_12. code

Experiments

Figure 453. Time taken to transfer and multiply 2-dimensional matrices of various sizes on a GTX 480 GPU. Data kindly supplied by Gregg and Hazelwood Gregg_11. code
Figure 454. Relative performance (y-axis) of libraries optimized to run on various processors (x-axis). Data from Bird Bird_10. code
Figure 455. Number of integer constants having the lexical form of a decimal-constant (the literal 0 is also included in this set) and hexadecimal-constant that have a given value. Data from Jones Jones_05a. code
Figure 456. One and two-sided significance testing. code
Figure 457. A cube plot of three configuration factors and corresponding benchmark results (blue) from Memory table experiment. Data from Citron et al Citron_03b. code
Figure 458. Design plot showing the impact of each configuration factor on the performance of Memo table on benchmark performance. Data from Citron et al Citron_03b. code
Figure 459. Interaction plot showing how cint changes with size for given values of associativity and mapping. Data from Citron et al Citron_03b. code
Figure 460. Half-normal plot of data from a Plackett and Burman design experiment. Data from Debnath et al Debnath_08. code
Figure 461. Number of Reflection benchmark results achieving a given score, reported for GTX 970 cards from three third-party manufacturers. Data extracted from UserBenchmark.com. code
Figure 462. Density plots of project bids submitted by companies before/after seeing a requirements document. Data from Jørgensen et al Jorgensen_04c. code
Figure 463. Density plot of task implementation estimates: with no instructions (red) and with instruction on what to do (blue). Data from Jørgensen el al Jorgensen_04. code
Figure 464. Examples of correlation between samples of two value pairs, plotted on x- and y-axis. code
Figure 465. Number of software faults having a given consequence, based on an analysis of faults in Cassandra. Data from Gunawi et al Gunawi_14. code
Figure 466. Performance and rental cost of early computers, with straight line fits for a few years. Data from Knight Knight_66. code
Figure 467. Feature size, in Silicon atoms, of microprocessors. Data from Danowitz et al Danowitz_12. code
Figure 468. Maximum number of records sorted in 1 minute and using 1 penny’s worth of system time (upper). SPEC2006 integer benchmark results (lower). Data from Gray et al Gray_14 and SPEC SPEC_14. code
Figure 469. Total system power consumed when sorting 10, 20, 30, 40, 50 million integers (colored pluses) using three techniques running on the same processor at different clock frequencies. Data from Götz et al Gotz_14. code
Figure 470. Power consumed by 10 Amtel SAM3U microcontrollers at various temperatures when sleeping or running. Data from Wanner et al Wanner_10. code
Figure 471. Power spectrum of electrical power consumed by the Botanica App executing on a BeagleBone Black running Android 4.2.2. Data from Saborido et al Saborido_15. code
Figure 472. Read bandwidth at various offsets for new disks sold in 2002 (upper) and 2006 (lower). Data kindly provided by Krevat Krevat_13. code
Figure 473. Average power consumed by one server’s CPU (four Pentium 4 Xeons; red) and memory (8 GB PC133 DIMMs; blue) running the SPEC CPU2006 benchmark (upper) and breakdown by system component when executing various programs. Data from Bircher Bircher_10. code
Figure 474. Time taken to find a unique item in arrays of various size, containing distinct items, using various search algorithms; grey lines are L1, L2 and L3 processor cache sizes. Data from Khuong et al Khuong_15. code
Figure 475. FFT benchmark executed 2,048 times followed by system reboot, repeated 10 times. Data kindly provided by from Kalibera_05. code
Figure 476. Percentage change, relative to no environment variables, in perlbench performance as characters are added to the environment. Data extracted from Mytkowicz et al Mytkowicz_08. code
Figure 477. Changes in SPEC CPU2006 benchmark performance caused by cache and memory bus contention for one dual processor Intel Xeon E5345 system. Data kindly provided by Babka Babka_12. code
Figure 478. Execution time of 330.art_m, an OpenMP benchmark program, using different compilers, number of threads and setting of thread affinity. Data kindly provided by Mazouz Mazouz_13. code
Figure 479. Access times when walking through memory using three fixed stride patterns (i.e., 32, 64 and 128 bytes) on a quad-core Intel Xeon E5345; grey lines at one standard deviation. Data kindly provided by Babka Babka_09. code
Figure 480. Performance variation of programs from the Talos benchmark run on original OS and a stabilised OS. Data from Larres Larres_12. code
Figure 481. Operations per second of a file-sever mounted on one of ext2, ext3, rfs and xfs filesystems (same color for each filesystem) using various options. Data kindly supplied by Huang Zhou_12. code
Figure 482. Percentage change in SPEC number, relative to version 4.0.4, for 12 programs compiled using six different versions of gcc (compiling to 64-bits with the O3 option). Data from Makarow Makarow_14. code
Figure 483. Execution time of xy file compressor, compiled using gcc using various optimization options, running on various systems (lines are mean execution time when compiled using each option). Data kindly supplied by Petkovich de_Oliveira_13. code
Figure 484. Execution time of Perlbench, from SPEC benchmark, on six systems, when linked in three different orders and address randomization on/off. Data kindly supplied by Reidemeister de_Oliveira_13. code
Figure 485. Performance of PassMark memory benchmark on 783 Intel Core i7-3770K systems; lower plot created by trimming 10% of values from the ends of what appears in the upper plot. Data kindly supplied by David Wren PassMark_14. code
Figure 486. Ubench cpu performance on small (upper) and large (lower) EC2 instances, Europe in red and US in green. Data kindly provided by Dittrich Schad_10. code
Figure 487. Number of lines of code that 101 professional developers, with a given number of years experience, estimate they have written, lines are various regression fits. Data from Jones Jones_06aJones_08aJones_09b. code

Overview of &R;

Figure 488. Plot produced by hello_world.R program. code
Figure 489. The unique bytes per window (256 bytes wide) of a pdf file. code

Data preparation

Figure 490. Screen height and width reported by 682,000 unique devices that downloaded an App from OpenSignal in 2015 (upper), reported measurements ordered so height always the larger value (lower). Data from OpenSignal OpenSignal_15. code
Figure 491. Number of reported vulnerabilities, per day, in the US National Vulnerability Database for 2003. Data from the National Vulnerability Database NVD_14. code
Figure 492. Percentage occurrence of the first digit of hexadecimal numbers in C source and estimated from Google book data. Data from Jones Jones_05a and Michel et al Michel_11. code
Figure 493. Number of processes executing for a given amount of time, with measurements expressed using two and six significant digits. Data from Feitelson Feitelson_14. code