Hint: It's not everything you paid for You've all seen the ads for direct to consumer (DTC) testing. Several companies are out there. They'll take your sample and send you the results of the genotyping assays they've worked out. It may be a large number of variations (like 600 thousand) or a small focused panel of about 40 to 60 variations. You send them the money and the sample, and they send you the data. Simple, right? Yes and no. It is a simple transaction and there is data there. But data isn't the answer to your question. Data is what their machine spits out. The answer to your question is information, and that's a little harder to come by. Each genetic testing company has a report they send you or give you access to on the web. It may be on their whole panel or it may be on a subset. Hopefully, the report will tell you if you have the non-variant form (wild type) and what the variation means if you have it. You'll want to know how common it is, what happens to people with it, and what it means for your health. And a lot of companies' reports tell you those things. So far, so good. But genetics are actually far more complicated than the products we've been sold so far. There are gene-gene interactions and gene-environment interactions that aren't taken into consideration. There's the context in which the variation occurs; that is, how other variations change the action of that variation. So you get a test and it tells you that you are at higher risk for an illness when you aren't. You may change your lifestyle or even something more drastic in response. Alternately you may get a result that tells you you are at lower risk when you aren't. In short, we need more than we're getting. The future is now All the things I've said above apply to genotyping, the targeted measuring of changes at individual known locations. That's one technique of genetic testing. But there's a whole other world out there, and it's a lot bigger than genotyping. We've been limited to genotyping by cost, but the costs for this other type of technique are going down rapidly. So get ready for the future: sequencing. With sequencing, we don't just read one spot to see if it is different. We read an entire stretch to see what's different. It's much more of a discovery process than genotyping, because we may actually find variation where no one has seen variation before in the genome. Instead of measuring one base pair in a gene with 17,000 base pairs, we're looking at them all. It's a lot more data. And it could be a lot more information. One day, it will be. But today, not so much. People have gotten so used to genotyping, that the sequencing industry has found a way to take a lot of the potential information out of any reporting. Sequencing reports About 5 years ago the American College of Medical Genetics and Genomics(ACMG) and the Association for Molecular Pathology (AMP) published a paper in which they set up standards for genetic lab reporting of data. Among the points the paper makes is that only members of the authoring societies should be able to tell people what their data means. Now who would have predicted that? They also set up a complicated algorithm to evaluate each variant to one of 5 categories: Benign, Likely benign, Uncertain, Likely pathogenic, and Pathogenic. If that looks familiar, it is. It is the five point Likert scale so popular in psychology and polling: Agree very much, agree, don't agree or disagree, disagree, disagree very much. The Likert scale is popular because it makes sense to the primate brain. I either care or I don't. I either care a little or a lot. Five points to simplify any situation. In essence, the ACMG criteria is mapping primate psychology onto molecular biology. A lot is lost in the translation. But remember back to genotyping where we were looking only at one change and one disease and not paying any attention to interactions? Well, we still aren't here. ACMG is still focused on what are referred to as Mendelian illnesses. That is, one gene, one disease. Gene-gene interactions don't fit well into the five categories. Nor do risk factors for other things, or variations that don't do anything by themselves unless a certain environmental stimulus comes by. So a lot of the benefit of sequencing gets sequenced out by ACMG based reporting. At the end we have a list of variants that look a lot like a genotyping report. Well, sequencing is a lot more expensive, so shouldn't we get more? Yes, and we can. The future is tomorrow Every sequencing machine produces a number of files for every sample. In the end, though, it spits out a file type called Variant Call Format (VCF). It's a listing of the hundreds of variants from wild type that the sample has for the regions measured. That sound great, right? All the variations and what they all mean to me. That's what I want. Well, not so fast. By convention, all the variations that are Benign or Likely benign are filtered out. And the ACMG criteria take out most of the rest. An example is a variant in the gene APOE that is involved in cholesterol levels. The variant is known and has an rsid if you want to look it up (rs7412). It occurs in about 8% of people, and ACMG uses a 5% cut off for Mendelian disease. So if something occurs in more than 5% of people, it can't be pathogenic. If you were to get a sequencing test of APOE from a lab that uses ACMG, you'd probably never find out if you had a variation at rs7412. Even though OMIM considers it pathogenic, Clinvar lists it as pathogenic, and PharmGKB considers that it is a very important variant for the use of certain cholesterol lowering meds, your doctor and you will probably never know you have the variation. But that's the present and I told you we'd be talking about the future. Wouldn't it be nice if the labs, like the DTC testing companies do, gave you not only their report, but the raw data as well. If you're paying $1000 for a test (or your insurance company is), shouldn't you get all the data that was created by the test? And if the lab can't give you all the information about that data, wouldn't you like someone else to have a site where you could upload your data and get the answers you need? So you're probably wondering, "Can I send my VCF file to GenEd for that information?" The answer today is, no, but not because we can't do it. We could. But almost no one has their VCF file. The sequencing companies keep them. We don't want to put the resources into building a tool that no one can use. But maybe we're wrong and you have your VCF file. Well, if you want this tool, let us know. Maybe the demand is there. And in the mean time, if you want us to look at your VCF file, drop us a line. We're happy to help.
Or any company for that matter Any company will need capital to start, and so, every company has investors. Those investors need to get paid back and rewarded for their risk. That means that the goal of every company is to 'increase throughput now and in the future.' Simple, right? Just make more money. Well, it may seem that simple at first glance. To make more money now and in the future, you need to make something to sell, and you need to sell it to someone. What a lab makes is information to answer peoples' questions about themselves, and we'll see in a bit how important those people are. Let's start though with how the lab produces the information. A lab is a highly technical undertaking. A genetics lab is even more so. The equipment, the techniques, and the bioinformatics needed to decipher the data they produce all require highly trained and attentive staff. And that's just to get any throughput at all. To get increasing throughput, your staff need to believe in the mission of the company. They have to be dedicated and innovative. Only then will they bring forward improvements to increase throughput. Any company that wants increasing throughput now and in the future will have to have stable dedicated staff. That requires minimizing staff turnover. Therefore you have to not just make more money, but you have to make your company an engaging and supportive place to work. Otherwise, you'll experience high turnover and spend more time getting less throughput. Let's say you've done that and you have great people doing great things. Now we'll look at who is going to buy your product. Those great people you employ will massage and tickle those expensive machines you bought so that they produce data. In a NGS genetics lab, it will be an impressive amount of data. But that's all it is, data. Your customers may, indeed, be impressed with reams of data. They may have some abstract sense of how wonderful it is that anyone can do what you do. Produce data, and you'll have their applause. Applause though, is not throughput. People only buy what helps them, which means you're going to have to solve a problem. The problem your potential customers have is that they have questions about themselves for which they have no answer. So in addition to producing reams of data, if you want people to buy your products in increasing amounts, you're going to have to satisfy their needs. You'll have to turn the data into the information that answers THEIR questions, not the questions YOU want to answer. Every company wants to increase throughput now and in the future, but to do that reliably, every company has to create an engaging and supportive environment for its staff and dedicate itself to satisfying the market. One goal becomes three. We all know it's a lot easier to throw one ball up and catch it than it is to juggle three. There will be a tendency when money is tight or time is tight, to to put a priority on throughput at the expense of either the employees' environment or the market's satisfaction. It will seem expedient. It will seem rational. But it will be wrong. It is wrong because to short one of those two in order to get a short term increase in throughput is only borrowing that throughput from the future. If our goal is to make money and run this would perhaps be sufficient, but the goal is to increase throughput now and in the future. That means that each bit of throughput must be based on unshifting foundations so that additional throughput can be built on top of it. Sacrificing the employee's experience or the market's satisfaction is like trying to build on shifting sand. You may make progress, but it will eventually fall. Any successful and sustainable genetics lab will have this three pronged strategy. Those that ignore the market's demands and their employees' needs will create a great flash and vanish quickly. Those that forget the reason for meeting the market's and employees' needs is increasing throughput will die on the vine. Using the TOC Thinking Processes, any company can use this strategic structure to create the production environment needed to meet their goal. If you need help with that, GenEd is here for just that reason. We'd like to see more successful small labs reaching their goal of increasing throughput, now and in the future.
The genetic lottery of Covid-19 infection I read a recent paper reporting the connection between severe Covid infection in young people and genetic abnormalities in a protein called Toll-like Receptor 7 (TLR7). The toll-like receptors (TLRs) are highly conserved proteins that belong to what is called the innate immune system. By this point in the pandemic you've all had a crash course in immunology, and you know that in response to infection we make anti-bodies. That is the adaptive immune system; the innate immune system kicks in first. Call it the first line of defense. That's where TLRs come in. For decades we have known that TLRs are part of this early innate immune response for us and many other animals. Lately it has become clear that mutations in these genes can lead to susceptibility to certain infectious illnesses when the mutations lower the function of the gene. If the innate immune system doesn't get started right, the invader has a chance to grow in numbers before the adaptive immune system can kick in. There can also be problems if a TLR variant increases the function of the gene, but that's another story for another day. Different TLRs respond to different infections, and we know that at least TLR7 responds to Covid-19. No matter how healthy you are, you can be walking around with variations in some TLR that could increase your susceptibility to a disease. You'd never know it until that virus or bacteria came along. There's no way to know unless you look. The authors of that first study did what's called targeted sequencing on the TLR genes of the young people with severe infection they were studying and found novel mutations in TLR7. You might be wondering, how common is something like that? Well, no one can predict how many people are walking around with a mutation no one has ever seen before, but we can look at known mutations of TLR7 and see how common they are. We looked at the TLR7 gene in a large database called the 1000 Genomes Project. Though it is named 1000 genomes, it actually contains about 2500 genomes from people around the world. We looked at all the data for TLR7. There were 38 variants found in those 2500 people, and of those, 17 were a big enough change to the amino acid structure of the protein that the gene makes. We looked at each of these 17 in the ClinVar database and all were 'unknown.' That is, no one had reported them as either pathogenic or benign. It's important to note though that this doesn't tell us a lot. Most reports are for what are called 'monogenetic Mendelian disease.' That's a lot of words to mean, one change in the gene leads to one illness. It doesn't take into account things like genes interacting with each other or the environment, including infectious illnesses, so we figured we'd have to look deeper. There's a way that scientists have found of looking at a mutation across time and species and determining how much allowance evolution makes for variations at that position in the genome. This can show the pressure toward 'rejected substitutions' at a location in a score called the GERP++_RS. We'll just call it the GERP for simplicity. The common cut off for the GERP is 2 - anything below that is likely to change a lot, and anything above that is not likely to. If evolution doesn't change something over long periods of time, it is probably very important. Eleven of the 17 had GERP scores over 2. Let's look some more at those. There's an algorithm one can use to roughly predict what changes in protein are deleterious and what are not. We used the Sorting Intolerant From Tolerant (SIFT) algorithm to take a look at those 11 variants that evolution didn't want changed. The SIFT is scored from 0 to 1 with a cut off around 0.05 as the 'danger line.' In the 11, we found four variants of interest. Three had SIFT scores well below 0.05 and one just above at 0.07. To confirm our list we took a look at a different algorithm in these four. The Combined Annotation Dependent Depletion (CADD) algorithm uses a diverse range of metrics to rank all known variations in relation to their predicted deleterious nature. Its score can be scaled in an interesting exponential way. A score of 10 means that the variant is in the top 10% for dangerousness. A score of 20 means it's in the top 1% and a score of 30 means top 0.1%. Here are the four variants we found and their CADD_phred scores. rs759793723; CADD 23.1; MAF 0.00026 rs748065199; CADD 23.7; MAF 0.00026 rs181600414; CADD 32; MAF 0.00026 rs138079334; CAD 32; MAF 0.00026 In essence each of them ranks high in predicted dangerousness and each was seen about once in the 1000 genomes database. We know there are more than just these 4 since the original paper showed 2 more, but at least these 4 are common enough to be found in a sample as small as 2500 people. If each of these is occurring in the general population at the frequency seen in the chart, about one in 1000 people has at least 1 of them. That gives us a floor of 0.1% of people who have won, or lost, the genetic lottery when it comes to Covid-19. It could be a higher number, and of course we've all heard of the non-genetic factors involved (age, obesity, smoking, etc). But the take away here is that no matter what our age or health status, none of us can be sure we don't harbor some TLR variation, or variation of any other part of the innate immune system, that can predispose us to a severe case if we are exposed to an infectious agent that others can tolerate relatively well. The best course is to not be exposed. Is there more news to come? Almost undoubtedly. There will be other TLR7 variants found that affect severity of illness with Covid-19. Will we find that those other risk factors are modulated via TLR7? Quite possibly. In addition, both TLR7 and TLR8 react to the same chemical: single stranded RNA. So it may be that we'll be hearing about new variants of TLR8 as well. As it is, while we found 4 TLR7 variants described here, we also found 9 in TLR8 that meet the same criteria. More to come.
IM NEW HERE IM TRYING TO SEE IF IM RELATED TO ANYBODY NEW
We're going to be releasing our COMT story soon. COMT is an important gene in the metabolism of neurotransmitters like dopamine and epinephrine. The GenEd COMT Story will tell you how your individual genotype at a specific location in COMT is important for dopamine metabolism and reward behaviors. As we work up to that release, we'll make blog posts about how GenEd builds a Story. Today's post is about the original description of the most famous SNP in COMT, rs4680. Rs4680 is a favorite SNP for companies that sell individualized genotyping reports. We've long suspected this is because of the great marketing tag that is almost always applied to COMT genotype at rs4680: "Warrior/Worrier". According to an excellent review article by Stein et al (1) David Goldman was their source for this wonderfully appealing dichotomy. Rather than repeat the excellent narrative found in the Stein et al paper, we're going to go back to the original peer-reviewed description of what would become the famous COMT SNP, rs4680 (2). Note that this paper, even after 20+ years in the literature, is still behind a paywall. Alas, this is not our problem to fix. Let's dissect what is freely available, the abstract. The authors assume we know enough about biochemistry to parse the acronym for COMT. COMT stands for Catechol O Methyl Transferase. "Catechol" is the chemistry word for molecules with a benzene ring and two hydroxyls. Benzene is a six carbon ring with alternating double bonds. Hydroxyls are the small molecule group -OH, an oxygen hooked to a hydrogen. The "O" is another chemistry term. In this context, it stands in for the Oxygen in one of the hydroxyls. "Methyl" is chemist-speak for a carbon with three hydrogens hooked to another molecule. "Transferase" is biochemist-speak for an enzyme-that-transfers or moves one part of a molecule to another. In the plainest English possible for this writer, COMT is the enzyme that transfers a methyl group to an oxygen on catechols. Why is transferring methyl groups to catechols important? Transferring a methyl group to a catecholamine like dopamine inactivates that catecholamine. As the authors point out, COMT doesn't just inactivate catecholamines like dopamine and epinephrine. COMT also inactivates drugs like L-DOPA. COMT's inactivating activity is variable between individual people. John Smith might have more active COMT than Joe Public. This is very important to people who might need L-DOPA. Depending on what studies you look at, it's probably also important for a host of other human diseases that involve catecholamine metabolism. This inter-individual variability in COMT was known before this paper. The authors major contribution was figuring out WHY there is variability between people in COMT activity. Note the date on reference 2. This study was done in the bad old days before gene sequencing was cheap and common. In the late 90s, biologists had to climb up snow-covered hills both ways to get to their PCR machines! Lachman et al developed an experiment that takes advantage of two common techniques in the pre-sequencing days of molecular biology: PCR and restriction enzymes. PCR is a technique for amplifying small amounts of a single chunk of DNA. Lachman et al PCRed "up" lots of COMT gene DNA from different people they already knew had different COMT enzyme activities. Restriction enzymes are tiny molecular machines built by bacteria to fight each other and viruses. Biologists in the dim recesses of time learned how to use these restriction enzymes to cut DNA in specific places. Restriction enzymes cleanly cut DNA in between exact sequences, making exactly two fragments of the PCRed gene. The restriction enzyme used by Lachman et al, Nla III, is one of these narrow specificity restriction enzymes. Think of Nla III as a pair of scissors that only cuts between two specific spellings of DNA letters, CATG and it's reciprocal GTAC: C A T G| |G T A C If you are lucky like the Lachman team in 1996, the location of your COMT mutation is between one of these two DNA words. Imagine Nla III sliding along the double helix, looking for its magical DNA letters. If the sequence is mutated, it doesn't cut! This effect is called a restriction fragment length polymorphism (RFLP) or "riflip". The Lachman group showed that one of these RFLPs is a misspelling of a part of the COMT gene. The more common G DNA base is misspelled as an A DNA base. This misspelling is propagated all the way to the COMT enzyme itself, resulting in a substitution of the more common amino acid valine (Val) for the amino acid methionine (Met) at position 158 in the COMT sequence. Swapping a bulky methionine for valine is roughly equivalent to asking your car to run on tires that are square instead of round. A car would move forward on square wheels, but it wouldn't be a very smooth ride. COMT enzymes with Val158 ("round tires") inactivate more catecholamines than COMT enzymes with Met158 ("square tires"). Before Lachman et al, no one knew exactly why there was so much variability in COMT activity. Their seminal work is the basis for everything we know about this particular mutation. We'll talk more about what catecholamine inactivation means for Warriors and Worriers in the next blog post. Be sure to follow along . . . 1. Stein DJ, Newman TK, Savitz J, Ramesar R (2006) Warriors versus worriers: The role of COMT gene variants. CNS Spectr 11(10):745–748. 2. Lachman HM, et al. (1996) Human catechol-O-methyltransferase pharmacogenetics: description of a functional polymorphism and its potential application to neuropsychiatric disorders. Pharmacogenetics 6(3):243–50.