The genetic lottery of Covid-19 infection I read a recent paper reporting the connection between severe Covid infection in young people and genetic abnormalities in a protein called Toll-like Receptor 7 (TLR7). The toll-like receptors (TLRs) are highly conserved proteins that belong to what is called the innate immune system. By this point in the pandemic you've all had a crash course in immunology, and you know that in response to infection we make anti-bodies. That is the adaptive immune system; the innate immune system kicks in first. Call it the first line of defense. That's where TLRs come in. For decades we have known that TLRs are part of this early innate immune response for us and many other animals. Lately it has become clear that mutations in these genes can lead to susceptibility to certain infectious illnesses when the mutations lower the function of the gene. If the innate immune system doesn't get started right, the invader has a chance to grow in numbers before the adaptive immune system can kick in. There can also be problems if a TLR variant increases the function of the gene, but that's another story for another day. Different TLRs respond to different infections, and we know that at least TLR7 responds to Covid-19. No matter how healthy you are, you can be walking around with variations in some TLR that could increase your susceptibility to a disease. You'd never know it until that virus or bacteria came along. There's no way to know unless you look. The authors of that first study did what's called targeted sequencing on the TLR genes of the young people with severe infection they were studying and found novel mutations in TLR7. You might be wondering, how common is something like that? Well, no one can predict how many people are walking around with a mutation no one has ever seen before, but we can look at known mutations of TLR7 and see how common they are. We looked at the TLR7 gene in a large database called the 1000 Genomes Project. Though it is named 1000 genomes, it actually contains about 2500 genomes from people around the world. We looked at all the data for TLR7. There were 38 variants found in those 2500 people, and of those, 17 were a big enough change to the amino acid structure of the protein that the gene makes. We looked at each of these 17 in the ClinVar database and all were 'unknown.' That is, no one had reported them as either pathogenic or benign. It's important to note though that this doesn't tell us a lot. Most reports are for what are called 'monogenetic Mendelian disease.' That's a lot of words to mean, one change in the gene leads to one illness. It doesn't take into account things like genes interacting with each other or the environment, including infectious illnesses, so we figured we'd have to look deeper. There's a way that scientists have found of looking at a mutation across time and species and determining how much allowance evolution makes for variations at that position in the genome. This can show the pressure toward 'rejected substitutions' at a location in a score called the GERP++_RS. We'll just call it the GERP for simplicity. The common cut off for the GERP is 2 - anything below that is likely to change a lot, and anything above that is not likely to. If evolution doesn't change something over long periods of time, it is probably very important. Eleven of the 17 had GERP scores over 2. Let's look some more at those. There's an algorithm one can use to roughly predict what changes in protein are deleterious and what are not. We used the Sorting Intolerant From Tolerant (SIFT) algorithm to take a look at those 11 variants that evolution didn't want changed. The SIFT is scored from 0 to 1 with a cut off around 0.05 as the 'danger line.' In the 11, we found four variants of interest. Three had SIFT scores well below 0.05 and one just above at 0.07. To confirm our list we took a look at a different algorithm in these four. The Combined Annotation Dependent Depletion (CADD) algorithm uses a diverse range of metrics to rank all known variations in relation to their predicted deleterious nature. Its score can be scaled in an interesting exponential way. A score of 10 means that the variant is in the top 10% for dangerousness. A score of 20 means it's in the top 1% and a score of 30 means top 0.1%. Here are the four variants we found and their CADD_phred scores. rs759793723; CADD 23.1; MAF 0.00026 rs748065199; CADD 23.7; MAF 0.00026 rs181600414; CADD 32; MAF 0.00026 rs138079334; CAD 32; MAF 0.00026 In essence each of them ranks high in predicted dangerousness and each was seen about once in the 1000 genomes database. We know there are more than just these 4 since the original paper showed 2 more, but at least these 4 are common enough to be found in a sample as small as 2500 people. If each of these is occurring in the general population at the frequency seen in the chart, about one in 1000 people has at least 1 of them. That gives us a floor of 0.1% of people who have won, or lost, the genetic lottery when it comes to Covid-19. It could be a higher number, and of course we've all heard of the non-genetic factors involved (age, obesity, smoking, etc). But the take away here is that no matter what our age or health status, none of us can be sure we don't harbor some TLR variation, or variation of any other part of the innate immune system, that can predispose us to a severe case if we are exposed to an infectious agent that others can tolerate relatively well. The best course is to not be exposed. Is there more news to come? Almost undoubtedly. There will be other TLR7 variants found that affect severity of illness with Covid-19. Will we find that those other risk factors are modulated via TLR7? Quite possibly. In addition, both TLR7 and TLR8 react to the same chemical: single stranded RNA. So it may be that we'll be hearing about new variants of TLR8 as well. As it is, while we found 4 TLR7 variants described here, we also found 9 in TLR8 that meet the same criteria. More to come.