Some Covid-19 Charts Can Be A Bit Misleading…

Trist'n Joseph
Jun 14, 2020
4 min read

Most persons tend to agree that looking at a graph and understanding the underlying message is much easier than reading through a table of numbers, especially with large data sets. Thus, graphs are a common method of visualizing the relationships in data. Graphs are not just ‘easier to understand’; they allow data visualization experts to highlight possible features of interest within data. With the coronavirus essentially crippling many economies, experts are interested in mitigation methods that will keep communities safe while also allowing for the reopening of businesses. Data visualization is a key step within this process. Therefore, many agencies are constantly publishing graphs depicting Covid-19’s growth trajectory within a location, and lobbying plans to help flatten the curve.However, not all graphs are created equally. In fact, some coronavirus graphs can actually be misleading if you do not know exactly what you’re looking at.

The Time Axis

Most graphs are not plotting the time by date. Rather, they illustrate the number of days since a location recorded a particular number of cases. Although this might seem like a subtle difference, the latter is done for good reason. Imagine looking at an extremely long x-axis with different location curves beginning at different dates; this has the potential to become confusing to interpret. Additionally, plotting the time as dates would make it more difficult to compare Covid-19’s growth, expressed in days, between locations. Hence the reason that the approach taken is useful. This method allows persons to easily see the difference in growth between locations and determine whether two locations are on the same path, regardless of if one began recording cases before the other. This method is also useful because provides a means of determining how long it took for a ‘lock-down’ measure to be effective within a location, and give experts and idea of what their mitigation strategy should be, given their circumstances.

2. The Logarithmic Scale

As was with the x-axis, the y-axis is not like your typical scale. The majority of these graphs produced use a logarithmic, or log, scale. With this, each space on the y-axis is a multiple of 10; that is, the y-axis will look like 10, 100, 1,000 and so on. Logged scales are useful because their plots make it easy to see rates of change, they respond to skewness towards large values, and they allow for a wide range of data to be displayed. This multiplicative property, however, means that increases in the number of cases do not visually take up the same amount of space on the graph. The area representing the first 1,000 cases will seem huge, while the second thousand will look immensely smaller, and the third, almost nonexistent. This means that the higher the number is on the graph, the more squished it will look. Thus, it could seem like the number of cases in a location is rapidly slowing down, or that it has stopped even, when in fact it could be as a result of the scaling. So then why not just use a linear scale? Well, because it would do a poor job of representing the insights that experts are looking for. The linear scale might make it seem like some locations are in a worse-off position than others because the curve might look steeper, when in fact both locations could be on the same trajectory.

3. The Data Represented

Lastly, these graphs are depicting the number of confirmed cases within a location. This value is largely determined by the number of persons being tested, the accuracy of the tests, and how long it takes for the results of a test to be processed. These three points might seem either intuitive or very confusing, so let me break them down. The more persons a country tests, the more information that country has about the state of the coronavirus within their location. But, as the number of testing in a country increases, the higher the number of confirmed cases (if the coronavirus is spreading within that area). Therefore, countries which engage in a lower number of testing than others, or those who do no testing at all, will report a lower number of confirmed cases than those who are actively engaging in testing, even though that might not be true. The test’s accuracy refers to how well it reports positive for a person who actually has Covid-19. Now at this stage, we only know for sure if a person had Covid-19 based on the results of the test. But consider a scenario where the test produces a ‘false positive’ 80% of the time. It would mean that of the persons who are not actually sick, 80% of them will be recorded as being sick, causing the number of confirmed cases to be drastically over-reported. Finally, locations which have a faster turnover rate for producing results will have a more accurate estimate of the number of infected persons as opposed to a country with a slower rate. Knowing that this is true, it might not necessarily be appropriate to directly compare the results of confirmed cases across locations. To get the most insightful comparisons, it would be useful to compare locations with similar circumstances, such as rate of testing, population size, even topography or such the like.

The main take away is that the points discussed suggest that the curves within the graphs produced are not a true representation of how Covid-19 is spreading within a location, but a rough estimate that experts are using to inform their plans. When looking at these graphs, persons should take the time to analyze exactly what is being represented instead of making somewhat uninformed decisions. Although the curve for a location can seem to be ‘flat’, just remember, looks can be deceiving.

Some Covid-19 Charts Can Be A Bit Misleading…

Recent Posts

Comments

Subscribe Form