Bad Chemistry II – Poisons

In the second part of this series I’m going to have a look at some specific types of poisons and the treatments that are available for them. This will not by any means be an exhaustive reference source, but more an overview of substances that tend to feature prominently in fiction, and which tend to fit a particular type of purpose within a narrative.

Cyanide (Fast) – The exact definition of a cyanide is a little confusing, but for the purposes of toxicology it tends to be used to describe organic compounds containing the cyanide group (CN). This chemical group is very common in chemistry, but causes problems for many living creatures when released into solution as an ion, something that happens very easy with organic compounds. This ion inhibits the electron transfer chain with, a key reaction in providing energy for your cells.

Cyanide toxicity is fast, and once it’s present within the bloodstream in large quantities it does pretty much fulfill the Hollywood, drop dead in seconds image, this is most likely to happen when it is breathed in as a gas, or absorbed through the skin.

As such, the treatment of cyanide poisoning is heavily complicated by the fact that the victim is quite probably already dead by the time help arrives (as my Toxicology lecturer put it, somewhat flippantly, if a living patient presents with Hydrogen Cyanide poisoning then congratulations they’re going to live). That said in the case of lower doses, or more slowly absorbed oral poisoning, treatments do exist, mostly involving the introduction of substances into the body that the cyanide will react with. This is a not a magic cure all, but will reduce the severity of the poisoning. Prognosis for survivors is fairly good, but long term damage can result, especially to the heart, brain, and the rest of the nervous system.

Metal Toxicity (Slow) – Metal toxicity is often considered together, not because many heavy metals have the same mechanisms of effect, or even because they target the same tissues within the body, but because the body is very poor at excreting them, meaning that they will tend to concentrate within the body over time, a process known as bio-accumulation. Because of this, treatment often involves using substances, known as chelating agents, which will bind to the metal and help it to be eliminated from the body.

This tendency means that such substances have been often been (mis)used, throughout history, to poison a victim over an extended  period of time, often with the intent of avoiding detection or causing the appearance of a chronic illness. It’s important to remember that their tendency to cause slow long term poisoning does not preclude acute toxicity from occurring from a single large dose, but many of these substances tend to produce fairly general damage to organs, which can make for a long and lingering death even in a large exposure. Neurological damage is very common, especially with lead and mercury.

Another key issue with metal toxicity, is the form that they are present in. Elemental metals tend to be poorly absorbed by the body and relatively non-toxic, but when incorporated into organic compounds they can become many times more bio-available, and hence toxic. Elemental iron is sufficiently non-toxic that children’s cereal is pretty much “fortified” by mixing in iron filings, organic iron is highly toxic, which is the reason that the first thing a doctor would establish if your child had overdosed on vitamins, is whether they contain iron or not. Mercury also presents the risk that it will evaporate which makes it much more readily absorbed by the body.

Intentional heavy metal poisoning is now rare, at least in the developed world, primarily because it is very easy to detect, but environmental toxicity is still a very big deal, especially in poorer countries, often linked to us of the substances in industry, or their recovery from used electronics. Toxicity may also result due to illegal or inappropriate incorporation of the metal into another product, often as a dye.

Anticoagulants (Messy) – This usually means the classic rat poison warfarin, although there are a number of similar drugs, and aspirin can produce the same effects. In toxic concentrations these kill via blood loss, primarily resulting from internal bleeding.  Many anticoagulants merely block coagulant production, something that takes an extended period of time and which can be treated by replacing factor precursors and giving a blood transfusion, making it a relatively slow and unpleasant form of poisoning, but one that would be fairly easy to diagnose and treat where medical help is available.

Botulinum Toxin (Potent) – The most potent toxin that I’m aware of is Botulinum toxin which is produced by the bacteria Clostridum Botulinum. This substance causes the food poisoning known as botulism, and is used under the name Botox in a variety of cosmetic medical procedures, as well as some other medical applications. The toxin as released by the bacteria contains a number of very similar substances which prevent the release of neurotransmitters from nerve junctions. Depending on administration route the lethal dose for a human may be less than 0.1 micrograms (or one ten millionth of a gram).

The substance represents an obvious candidate for use in warfare or terrorism. Whilst it is not particularly well suited to use as a weapon in terms of stability or absorption, its incredible potency and relatively straightforward manufacturing process make it a very real threat.

Tetrodotoxin (exotic) – Tetrodotoxin is of interest, primarily because it occurs in a wide range of different creatures, many of which have been used prominently in fiction. Tetrodotoxin is responsible for the toxicity of Japanese pufferfish fugu, the bite of the blue ringed octopus, as well as wide and varied range of toads, sea stars, fish and worms. This is thought to be possible because the toxin is actually manufactured by one of a number of different bacteria, that are living within the creatures.

The toxin acts on the nervous system and serves to paralyze skeletal muscles, usually leading to death as a result of suffocation. The victim can remain perfectly lucid but paralysed until they expire. Whilst its true that the toxin has no antidote, this does not mean that it can not be treated. Treatment primarily consists of supporting the patients breathing mechanically and trying to prevent any further absorption of the drug, although serious cardiac symptoms can also occur as the patient loses the ability to regulate their heart rate. If the patient can be kept alive by supporting respiration for at least 24 hours the effects of the drug will start to wear of, and the patient has an excellent chance of a full recovery.

A quick update

As I have previously indicated. I’ve been very busy over the last few months, moving vocations, building up my CV, and stalking one of those elusive job thingies.

This hunt has now been concluded.

This means that, after a brief period of physical relocation and internet reconnection, I may actually have spare time that I can devote to the website without feeling guilty.

So stay tuned people. and if any of you have suggestions for articles that you want to see, now is the time to proffer them upon the altar of the comments section.

But only metaphorically.

Bad chemistry I – Poisoning

A writer’s guide to poison, poisoning, and poisonous things

This article is going to provide a brief overview of toxicology, highlighting the concepts and considerations that I believe will be most useful to writers.

This will be a series of articles, the next will discuss individual poisons in more detail, with others talking about types of chemical hazards as well as radiation.

I’m therefore not going to spend much time listing exotic poisons or talking about individual substances in this article, although I will be giving examples where relevant.

 

Confusing words-

These are the words with the important definitions.

Poison – This refers to any substance that interacts with the functioning of a biological organism in a negative way.  It’s the umbrella term. As this definition could actually encompass just about every known substance, the term tends to be used only to describe substances in the specific context that harm is likely to occur, or substances that are particularly prone to negative interactions with biology.

Toxin –  Toxins are poisons that are produced by a biological organism. Toxins are poisons, but not all poisons are toxins. This is not helped by the way that most words that derive from the same Greek root, such as toxicology, toxic, or intoxication, are all general terms.

Venom – The next step down, a venom refers specifically to a toxin that is injected directly into another organism, whether by sting, fang or claw.

So, arsenic introduced into food is a poison, but not a toxin or a venom. Ricin, which is derived from the caster oil plant, is a toxin, but not a venom. If you are bitten by a snake, then the venom that it injects, is also a toxin and a poison. All three substances are toxic.

There are grey areas here, but in practical terms you just need to consider the origin of the substance at hand. If it came from an animal or plant, you can call it a toxin, if it was injected by that creature, it’s also a venom. If neither of those apply, you should just call it a poison.

So, what makes a substance poisonous?

 

Mechanism –

There a lot of different ways that a poison can interfere with biology, this can involve a relatively crude chemical reaction, such as the corrosive destruction of tissue, an overloading of normal biological processes, or very specific and complicated interactions with biochemistry.

In general, the biologically derived toxins tend to be large complicated molecules, often proteins, which have complex interaction with biological processes. Venoms, especially, are often composed of multiple discrete substances, often with separate modes of action.

The more complex the molecule and interaction, the more likely it is that the poison will be specific to a given organism, tissue, or situation. As I’ve discussed at length in an earlier article, complex interaction between unrelated biochemical systems is unlikely, something which may become a problem when writing “hard” science fiction.

 

Dosage-

Probably the most important concept in toxicology, famously stated by Paracelsus, who is considered the father of modern Toxicology –

“All things are poison, and nothing is without poison; only the dose permits something not to be poisonous.”

In large enough amounts even water can be toxic, and I’m not talking about drowning here.

In the abstract, any living organism can be considered as a very complicated series of chemical reactions, and so, if you throw enough of any other chemical at them, eventually bad things will tend to happen.

Conversely, many substances which are widely recognized as poisons are present in small amounts all around us. Chances are that you have accidentally consumed an apple pip at some point in your life, without immediately expiring as a result of the tiny quantities of cyanide that it contained.

It is dosage then that is most often used when comparing the toxicity of substances. You may have seen this expressed as LD50 values, a rather grim notation in which the figure indicates a quantity of a given substance required to kill half the organisms exposed to it.

LD50 values must be approached cautiously for a number of reasons. Firstly, for reasons that should be obvious, human toxicology data is seldom acquired by careful and systematic experimentation, rather from anecdote and reconstruction of tragic events, contributed by people who were often too busy at the time to take careful measurements. Human data is often missing altogether for many substances, and estimates can only be made by extrapolating data from other organisms.

Beyond that there is still room for confusion. LD50 values should specify weight and administration, but this information is often omitted in favor of lowest possible LD50 value. Consider that a 6’3” male can readily weigh 50% more than a 5’2” women and more than three times as much as a 5 year old child. For some substances the dosage will not scale well with mass anyway. Children or the elderly or those in poor health, or with specific conditions, may be more susceptible, and, as will be discussed, genetics and behavior often play important roles as well.

 

Administration route-

It’s not uncommon to hear journalists describe any large, allegedly newsworthy, accumulation of toxic substances according to the number of people that could potentially be poisoned, as in “enough botulinum toxin, to kill 1000 elephants”. This is a lot like describing the contents of a knife shop by trying to estimate the number of people that could be stabbed. Poisons do not magically distribute themselves amongst and within the population that they may be intended to poison.

Some substances are readily absorbed through the lungs, or directly through the skin, but others must be ingested or even injected directly, something that is less likely to happen by accident.

Even after the poison enters the body, it still has to reach the susceptible target tissue. It’s quite possible for an injected poison to become trapped in muscle tissue rather than reaching the bloodstream. The brain is protected from many toxic substances by a system of filtration, termed the blood-brain barrier. Certain diseases can reduce the effectiveness of this barrier, which will render the sufferer susceptible to those toxins.

If a poison is in the form of a gas, or absorbed through skin contact, it is likely that many times the lethal dose will be needed to ensure that enough is taken up by the target.

 

Metabolism-

What happens to the poison after it enters the body is also important. The body is effective at removing molecules that shouldn’t be present, or which no longer serve a purpose. This is a progressive, but surprisingly predictable process, where the offending molecules are cut up into progressively smaller metabolites by metabolic enzymes and then excreted.

For toxic substances this often means neutralization, but other substances that weren’t initially harmful may actually be transformed into a toxic form.

An important concept here, that you might come across when researching this subject, is that of first pass metabolism, which describes the way that drugs absorbed in the stomach, will pass immediately through the liver, where a lot of metabolic processes take place. This means that even some substances that are absorbed well by the stomach will be dramatically influenced by injecting them directly. It also means that the precise site at which an injection of toxic substances occurs, both relative to the target tissue and relative to the liver can be important.

These metabolic pathways are often shared, and can potentially be overloaded. The reason that alcohol is contraindicated when taking many types of medication is that it shares a common metabolic pathway with them. Other substances can directly inhibit metabolic enzymes; this is true of chemicals contained within Grapefruit of all things, the fruit of nightmares, if you are a pharmacologist.

This can all have interesting implications. In the case of ethylene-glycol (anti-freeze) poisoning, the toxicity results from metabolism by the alcohol metabolizing pathway. Because the pathway has a much higher affinity for the alcohol, the poisoning can be treated by giving the victim large quantities of ethanol until all the antifreeze has been excreted, essentially, treatment involves getting as drunk as possible.

Metabolic pathways can be up-regulated or down-regulated by the body over time, in response to prolonged exposure to the offending metabolite. This is the mechanism by which people build up resistance to poisons, medication, or drugs of abuse, which will, in turn, influence their susceptibility to other drugs or poisons metabolized in the same way. Metabolism can also vary as a result of genetic factors, especially across different ethnicities. The enzyme complicit in paracetamol’s (acetaminophen) toxicity, for example, has been observed to vary by a factor of as much as 50 between individuals, and will increase with chronic alcohol exposure, increasing the risk of poisoning.

 

Elimination-

Some substances are metabolized or are excreted from the body almost immediately, but others persist for a very long time. This means that repeated exposure to relatively small amounts of that substance over a long period of time can result in poisoning. This is particularly common with heavy metal toxicity.

Aside from normal metabolic processes, the toxicity of substance is also influenced by what happens to individual molecules whilst exerting their harmful effect. A poison that is consumed in that reaction is usually going to be less problematic than one which can as a catalyst for other reactions, as this means that the molecule itself will persist and continue to produce harm. Some enzyme molecules are individually capable of catalyzing more than a million reactions per second without being used up.

Also important is the fate of whatever substances that the toxin is reacting with. Some toxins are eliminated quickly, but irreversibly destroy the target molecules or tissue, and the body’s ability to quickly replace these substances or repair the damage is limited. Nerve agents, for example, work by binding to an enzyme anticholinesterase involved in nerve transmission, often producing permanent bonds with the molecule, which would otherwise be taken back up and recycled by the body, resulting in prolonged poisoning. One of the drugs used to treat poisoning of this type, Pralidoxime Chloride acts to reactivate the enzyme and restore normal function.

Speaking of which…

 

Antidotes and other treatments-

Almost as important to fiction as the poison, is the concept of the antidote, the magical wonder drug that will infallibly reverse poisoning in seconds, if administered at any point before death (the time of which can be estimated to the second).

Needless to say, it’s not quite that simple in reality.

It’s important to realize that, whilst many types of poisoning can be treated with a specific drug, these treatments are not often fast, infallible, or even safe. Some substances oppose the effect of poison by producing an opposing effect, in essence poisoning the patient in the opposite direction, others do nothing to repair harm, but facilitate the breakdown or excretion of the poison, prevent it from binding to its target, or even just compete to produce the same mischief in a more reversible way.

Because animal toxins toxin molecules are often composed of large and complex molecules, antibodies produced by the immune system are sometimes effective against them, allowing an immune response to the toxin to develop. This is exploited in order to manufacture antiserum, extracted from animals deliberately exposed to the toxin. However, these treatments can produce a severe immune response from the patient themselves, especially on repeated exposure, and are sufficiently risky that, in reality, not introducing the antiserum to the patient is often judged the lesser risk.

More apparently cinematic antidotes do exist, but even then there can be unexpected problems, Opioid poisoning can be spectacularly reversed in seconds, by injection of the drug naloxone (which also produces the instant onset of withdrawal symptoms if the poisoning was caused by abuse), but the effects of the antidote are of shorter duration than the underlying poisoning, meaning that unless the patient is given subsequent doses, they will relapse when it wears off. Preventing a spectacularly withdrawn drug addict from leaving the emergency room in search of another fix, after being apparently cured of their overdose can be a real problem for a doctor.

When writing dialogue it’s probably sensible to avoid having a doctor or biologist use the term antidote at all, they may use it in individual cases, especially when talking to patients, but in general they are more likely to think and talk about these substances in terms of “medication” or “treatment”.

Just as important to the real world treatment of poisoning is the physical reduction of poison absorption. If the poison has been taken in by mouth this means inducing vomiting (when safe to do so), pumping out the stomach, or introducing activated charcoal into the stomach, which will neutralize a lot of substances before they can be absorbed.

If the substance is absorbed through the skin, the victim can be washed (often with a mild bleach solution). If the poison has been injected it is sometime appropriate to attempt to recover the poison by suction or use a tourniquet to prevent its spread into general circulation.

Useful resources for science fiction writers

The following links represent some of the best articles and resources I’ve come across while putting this blog together.

 

Articles about writing, by writers

Holly Lisle – writing blog with an extensive series of how to articles.

Charles Stross – Lots of useful material on his blog. Of particular note, “Common misconceptions about publishing

Catherynne Valente – Recent set of articles, also hosted at Charlies blog “A Numerical List of Loosely-Connected Thoughts on Writing” (2,3,4) and an additional article about digital publishing. Her own website is here.

Neil Gaiman – lost of advice on his site including,  “Advice to authors“, and an article here about literary agents, which also has a lot of additional links.

J. Steven York –  Excellent blog, which includes the article “Writers and other delusional people

 

Screenwriting

JohnAugust.com – Probably the best screenwriting blog. Has a great directory of it’s own resource articles.

Josh Friedman – Another excellent screenwriting blog.

GoIntoTheStory.com – Again, lots of very useful resources here, although not the most accessibly structured site.

Screenwritinginfo.org – A very nice overview of basic screenwriting concepts.

 

Technology

A trade secret, for those of you aren’t irredeemably geeky. If you need solid, well researched, reference texts about future technology or sci-fi stuff, you can do a lot worse than check out Role Playing Game (RPG) source books. The best of these, from a reference viewpoint, are probably those written for the Eclipse Phase and Transhuman Space settings, Star Hero, and the other GURPS source books (especially Space, Ultra-Tech, and Bio-Tech) These aren’t free resources, but can be very worthwhile purchases for any science fiction author. A lot of them are available inexpensively in pdf format, or can be bought cheaply second hand.

How Stuff Works  – By no means exhaustive, but the articles that are present tend to be well researched.

DARPA – Some interesting insight here into what the US government is throwing money at.

SETI projects – Again, great info on their funded programs.

Wired technology blogs – An asset to any writer’s feed reader. Especially Danger Room, Gadget Lab, and Wired Science

Slashdot – Turns out 80’s cyberpunk novels were pretty bang on the money about a lot of stuff.

Technology Review – The MIT technology blog, yet another daily source of plot hooks.

TED – Lot’s of great stuff here, if you’ve got the bandwidth.

 

 

English and Grammar

Become a better writer – At Writing World, linked separately below.

The Elements of Style – The classic reference work by William Strunk Jr., available online at Bartleby.com

Grammar Handbook – From the University of Illinois at Urbana-Champaign

Guide to grammar and styleJack Lynch

 

Other useful stuff

Overused story ideas – From Strange Horizons. Check here, ideally before you write the nine novel series.

National statistics – Official data here for the UK, US, Canada, and the EU. Also the CDC. Nielsen also publishes some very useful data publicly.

UNTERM – United Nations terminology database, attempting to keep track of context specific word usage in 8 different languages. This is where they send the linguists and database administrators who have been very bad.

The Urban Dictionary – A lot of content is NSFW, so be warned!

WolframAlpha – A computational knowledge engine. If you need to put a value to just about anything, this is the place to start to start.

Kate’s Onomastikon – A listing of common names for different cultures.

Writer Beware – Helping writers avoid scams.

Writing World – Lots more useful resources.

TVTropes – Probably becoming less useful over time, as fans continue to figure out new ways to shoehorn Naruto into every single category. Still very handy listing of common tropes, and in finding fiction that has used similar ideas to your own.

 

This document is a work in progress, so feel free to contact me or comment if you have links that you think I should include. I’m trying to be selective though, so please understand that I may not use your suggestion.

 

 

 

 

 

 

 

Normal service may be assumed at some point – A public service announcement

In the unlikely event that anyone is following this blog, they may have noticed that updates have been somewhat erratic. At present I am very much engaged in job hunting,  this is occupying a lot of time, and obviously has to take priority over everything else.

I am hopeful that I can get things settled down into a more normal schedule at some point in the immediate future, but until that happens I’d prefer not to get too carried away with filler content to the point of drowning out the writing resources that I’m intending to be the main focus of the site. My interest in the site is not waning, and I will still be updating the site as often as possible, it’s just that the updates are unlikely to adhere to a rigid schedule.

My plan for this website has always involved the long haul, and I hope that my readers can have some patience.

I would very much welcome any feedback that people can give. It’s easier for me to justify devoting time to the site right now, if you can give me some reassurance that people are actually finding useful content here. Conversely if people aren’t finding the site helpful, then I need to know that, especially if you can offer constructive suggestions as to how to improve things.

Thanks

The Wizard

The wizard has lived in the tower for a very long time, he is old and he is weary.

 

He will not help the boy.

 

The boy is young and inexperienced, but he learns fast and works hard. He asks only to be taught how to strike against the Darkness, as the prophecies demand.

 

The wizard sets the boy to menial tasks, he sends him to gather wood in the forest, or lay another coat of paint on the towers exterior. He makes the boy clean out his stable. He sends him on long pointless errands, and offers nought but discouragement.

 

The boy perseveres and, in time, begins to see wisdom in the wizard’s demands. He believes that he sees how they make him stronger, how they test his resolve. The boy shows nought but determination.

 

Eventually the wizards acquiesces, he gives the boy a sword and tells him the secrets. He speaks of dragons, of werewolf haunted forests, of dangerous swamps, icy tundra, and treacherous ravines. Most of all, he warns about the Darkness.

 

The boy takes the sword. The sword is notched and battered, it is dented, and the binding is charred, but it is unmistakeably the sword from the prophecy. It is the sword that will slay the shadow.

 

The wizard hopes that the boy is the hero of legend, the Wolf of Autumn, a just and glorious saviour, praised by happy smiling children for millennia to come. After all, it is prophesised.

 

The boy knows that he is the one. The hero, who will right the world, and avenge his family. He will bring a light into the darkness, which will burn for millennia to come. After all, it is prophesised.

 

The wizard thinks of the other boys who found the tower. The ones he could not drive away, the ones who lie crushed and burned, drowned and frozen. He remembers children who now run with the wolves in the woods, children who now serve the Darkness.

 

The boy leaves the tower, he fears the road ahead, a little, but he is resolved. In his mind’s eye he already sees the sword shattering the Darkness. Beyond that, he sees pennants and princesses, a bright shining future stretching ahead.

 

The wizard has lived in the tower for a very long time, he is old and he is weary.

A writer’s guide to genetics

As with many of my other articles this will not be an exhaustive guide, but rather more of a conceptual roadmap of the subject. I’m going to be going through what I think are the most important concepts for the largest numbers of my readers to understand (ideally, both of you).

I’m not just going to be talking about genetics though, because I tend to the view that it can only be properly understood when viewed in a slightly wider framework

 

You’re going to feel cheated if I don’t tell you what the letters stand for, right?

DNA – Deoxyribonucleic acid. Composed of two separate polymeric chains of nucleotides, each containing one of four bases. It is the sequence of these bases, Adenine, Cytosine, Guanine, and Thymine, that encode the information.

DNA, when collected in large quantities, smells strongly of vinegar, is apparently edible, and I am informed that it tastes just as disgusting as you might expect.

Your genome is the sum total of genetic information within an organism, this term can refer to a specific individual, but is often used in reference to entire species, such as with the human genome. Much of your genome actually consists of junk DNA, this isn’t unimportant, but it doesn’t contain useful information.

 

Blueprint of what?

The word blueprint is thrown around a lot when people talk about genetics, but this is somewhat misleading. DNA actually contains information about the individual parts, not the whole.

For DNA to do anything, it must be expressed. This process involves translating the DNA first into RNA, which can be considered to be a temporary, working version, of the original master copy. This RNA is then translated into a polypeptide.

Polypeptides are chains of amino-acids, of which there are twenty different types, each of which corresponds to a three base sequence (or codon) from the original DNA (the gene).

 

Turns out our biology is largely based on origami

These polypeptides fold into proteins, which may incorporate multiple polypeptide chains, as well as non-peptide molecules.

The resulting proteins form most of the complex molecules within your body, and it is the interactions between them that actually determine most of what happens within it.

Imagine that every single item in your house was produced by squirting out chains of lego bricks that folded themselves into everything you could possibly need. I’m not just talking about your chairs and mugs here, I’m talking about your television, your fridge, your new conservatory. No assembly or planning would be required, the mechanisms that control all of this folded from the little chains too.

There is no possible way for words to do justice to the complexity of the entire system. I talked a little while ago about translation of DNA into Polypeptides. The animation linked here covers the proccesses that make this happen in more depth, and it is mind blowing.

 

The point of all this

This is the thing that you absolutely need to understand. There is nothing in your  DNA that says “arm go here” Your arm happened because some proteins fold into cells and those cells respond in specific ways to their chemical signals, neighbouring cells, pressure gradients, and a thousand other things, and that happens because of a system of interaction between proteins that has continued uninterrupted pretty much since proteins first happened.

The specific system of protein interactions within an organism is termed the Proteome, the study of it is called Proteomics, and its practitioners should probably be termed Masochists. We tend to fixate on genetics, mainly because it is much easier for us to understand and influence, but it is only part of the picture.

This is a wildly complicated, emergent system that has arisen over millions of years, with no limitations to the factors that could influence it, no logic or requirement for consistency evident in its processes, and with no part of how it actually works documented anywhere, including your DNA. You probably can’t comprehend the true horror of this, unless you have worked in IT.

Essentially your DNA is just the HR filing system for the corporate entity that is your body. Every so often a Mummy corporation, and a Daddy corporation, will love each other very much, and little spin off company will be sent into the world, but they won’t succeed or fail solely on the quality of the paperwork that they inherit.

We all started out as a single cell, and that cell had DNA in it, but it also had a payload of additional biochemistry which was  important too.

I suspect that there is also joke in here about editors, but I’m worried that, if I find it, the metaphor police might actually start issuing warrants.

Anyway, this stuff has implications, even for writers.

 

Sample implications, provided for your convenience

1)      It’s very difficult to reconstruct an organism solely from genetic information – You can’t reconstruct the organism without recreating the proteomic environment that can use that information. Whilst the genome will contain the information for the pieces of that of proteome, it doesn’t tell you how it fits together, and without context you probably can’t decode the genome anyway, much less how they correspond to proteins. This is a chicken and egg situation that involves actual eggs.

Say we find some dinosaur DNA, it’s very unlikely that we can plug that DNA into the egg (in the sense of the actual ovum cell) of any modern creature to produce a dinosaur, but it’s possible that we could use our understanding of genetics and proteomics to reconstruct a viable egg, that will in turn produce a dinosaur, and then, as Hollywood teaches us, some kind of theme park and a whole bunch of lawsuits.

But if we found some Alien DNA analogue, then even if it worked in a very similar way to ours, even if we could identify patterns and codes at the genetic level, it would still be almost impossible for us to work out anything about the proteome of the organism, because the DNA doesn’t actually contain information about how proteins work.

 

2)      Complexity in biotechnology tends to scale exponentially – Say you want to turn an organisms skin green. You find a green protein somewhere, plug it into the target organisms DNA, and then try to get it expressed in a way that doesn’t interfere with anything vital. Organism is now green.

If instead you want to make an organism grow scales. You need to physically change the shape of cells, but you can’t do this without understanding the processes that govern the shape of the cell membrane, and understanding exactly how to change the peptides to achieve that. You need to be sure that changing the shape of the membrane doesn’t change the function of all of the important receptors and channels that go through that membrane, or interfere with its ability to take in nutrients, or the resources required for the cell to live.  Now, you need to consider how changing one cell affects its interactions with other cells, how to get the tissue formed by those cells to grow into new shapes, and then stop growing into new shapes. Can the organism still perspire? If it can’t how are you going to deal with that? What about circulatory changes, proprioception…? I could go on all night.

 

3)      Genetics will probably always involve experimentation, probably on living beings – Protein interactions are incredibly hard to predict. In the example given above, expressing a green protein to turn the organism green has a fair chance of working, and is relatively simple. Predicting in advance, however,  that a tiny corner of that molecule won’t interfere cataclysmically with some random cellular process, is the exact opposite of relatively simple. You could try growing the cells in a petri dish first; your first test subject could still die of liver failure because your green protein can’t be metabolized. And this is just the simplest case, the more complicated the changes, the harder it gets to predict the outcome.

 

4)      It’s not plagiarism if you are a biologist – The easiest way of dealing with the complexity issue is to find biology that has already happened and steal it. Any kind of DNA tampering is much more plausible if you can find something similar in a closely related creature or, better still, lingering in the organisms own genome. That junk DNA that I was talking about earlier contains an awful lot of old information from evolutionary history.

 

5)      Biological systems are simultaneously fragile and resilient – Related to the last point, when I described the complexity of protein interactions earlier, it’s possible that I left you with the impression that biology is entirely at the mercy of its own chaotic processes. This is not really accurate.

Biological systems tend to be highly resilient to familiar challenges but vulnerable to the totally unexpected.

Crude interactions with biology can be startlingly effective, because there are other systems that can cope with the problems that you have caused; but there  needs to be a reason for those systems to have evolved in place.

 

Using this stuff

As always, don’t let the facts get in the way of the stories, find ways to use the facts. Genetics is always going to be a subject that sees the facts get in the way of some of the fun, but temporarily getting in the way of the fun is not actually incompatible with storytelling.

Acknowledge the problems, then let your characters figure out how to overcome them, conquering the world with an army of lizard men will be so much sweeter for them if you make them really work for it.

Don’t worry too much about systematic research, but trawling wikipedia for random genetics facts can’t fail to land you plot hooks.

I do think that proteomics represents a massively under explored areas of science fiction. Anything that can be done to an organism with genetics can also be done, at least temporarily by interacting with the proteins directly.

The mainstream fixation on DNA could be a weakness to be exploited; perhaps your antagonist could develop an alternative mechanism for inheritability, his super soldiers enhancements don’t show up in the genetic screening because they aren’t in the DNA.

Conversely, if you someone, or something, could overcome the complexity issues, genetics and proteomics in tandem genuinely offer limitless potential to screw around with living organisms, genes can be switched on and off at will, and pretty much anything could lurk in the forgotten depths of our genomes.

I’d also suggest that that there is a lot of inspiration here for those of you who might be interested in writing about nanotechnology. Go back and watch that video again if you don’t believe me.

 

Coming attractions and matinee performances

If you like this article you might also want to check some of my earlier articles on interactions with alien biology, or plausible medical responses to weird unexpected stuff.

I’m intending to use this article as a springboard for a couple more articles, one on mutation, which is going to be taking a slightly more Saturday morning cartoon approach to this subject, as well as talking a bit more seriously about inheritance, and the other is going to talk about viruses.

Hope to see you back for them.

Horatio’s Dragon

 

Picture, if you will, a figure. Short grey hair, balding pate, thick glasses, and advancing age, no doubt dressed in a lab coat, or perhaps a waist coat with a pocket watch. This is precisely the sort of man that might be envisaged by you, the reader, in your prejudice, upon hearing the name like Horatio Pemblethwaite.

 

You would be almost completely wrong, of course.

 

Horatio was  only 28 years at the start of this narrative. He was tall and effortlessly lean, in that special kind of way that drives the effortlessly rotund to private murderous rage. His hair was short, blonde, and neatly cut. He was only moderately and unsurprisingly unattractive. He was an engineer, but he worked mainly with computers and only wore a white coat when important people were being shown round his place of work.

 

Mostly though, Horatio was obsessed.

 

Horatio’s passion was for dragons, his feelings on the subject were of similar intensity and scope to the enthusiasm Genghis Khan once felt on the subject of land ownership; they had come to define his life.

 

This had been the case for as long as anyone could remember. Even his parents could not precisely identify the precise point at which this fixation developed, but it is certainly true that dwagon was the second word that he spoke; his first utterance was burrito for reasons that will likewise forever remain opaque to logic, and which was, in any event, mistaken by his parents for gas. He was never able to articulate exactly what fascinated him so about giant flying reptiles, but that fascination was undeniable.

 

Horatio had seen almost every film ever made that featured dragons, in itself quite a feat because, as my reader may or may not be aware, most such films are quite uniformly awful, Horatio didn’t care. He greedily devoured many a poorly written novel for even a trace of draconic content.  At college he often played Dungeons and Dragons with his friends, but truth to tell he was always somewhat ambivalent about the dungeons. The small flat he lived in during his student days was plastered with posters and pictures and filled with mobiles and models, producing an unsettling visual effect, a continuous sea of scales, wings, and talons which quickly overwhelmed the eye. At any rate, the only girlfriend he invited back to the place soon developed an inexplicable headache.

 

But all these faux dragon surrogates left him wanting, a void still unfilled in his life, nay in his very soul, for Horatio’s burning ambition was, to meet in the flesh, a real live dragon.

 

This would, of course, be an unfortunate pipe dream, one of those tragic everyday quirks that bring colour to the people around you, if not for the minor detail of Horatio’s employer. For you see Horatio worked for Otherworldly Incorporated, an ambitious new start-up, who were determined to exploit what they saw as universes of  untapped potential.

 

I must take this time to explain to some of my readers, the slower ones, the essence of their business plan.

 

Many contemporary physicists believe that there is not one universe but an infinite number of them, a multiverse of dimensions, which are constantly splitting off from each other at each juncture of probability, spawning new universes for every imaginable possibility, and many more unimaginable ones. Almost uniquely for contemporary physicists they are in fact, just this once, absolutely right.

 

There exist, out there in the great void of possibility, universes ruled by super intelligent video games. There are whole galaxies which, due to the influence of bizarre otherworldly Von Neumann machines, are composed of nothing but pasta. Untold trillions of worlds exist in which Stalin was a noted humanitarian, Copernicus was wrong, David Icke was right, and in which you, the reader, are rich, successful, intelligent and the envy of all mankind. There are necessarily an infinite number of separate universes available to meet any possible specifications.

 

Contemporary physicists also love to spoil other people’s fun, and will therefore also tell you that such universes are unable to interact with each other in any substantive manner at all. In this respect, at least, they return to form and are decisively but authoritatively wrong.

 

Otherworldly Inc had found a way to reach them, they had, using a quite simple process that, alas, the narrator is unable to disclose to you for reasons of understandable pragmatism discovered a way to open the doorway to other worlds.

 

I’m sure that the less challenged of my readers understand something of the way that one can track certain entangled particles as they flit between levels of reality and perhaps use such tenuous links to forge gateways,  or how one could construct quantum computers to spy recursively on their alternate counterparts. That it might even be possible to tamper with the source code of the multiverse itself. But they would, no doubt, also appreciate that all of the aforementioned are possible only in a way described by that special kind of theoretical that is customarily invoked by physicists who harbour the secret ambition to see their work mentioned in mainstream newspapers. Otherworldly Inc did none of these things; the solution that they had found was much more elegant.

 

The managing directors were justifiably excited about their discovery and the spectacular implications it might have, both for humanity and their own shareholder dividends. They thought it could be used to access near infinite resources and vast cultural treasures; they did not hesitate to speculate, for the benefit of their investors, how it could provide comfortable living space for all of humanity. Like so many of history’s technological pioneers they never really considered that mostly people would find ways to exploit their invention to find new and interesting ways to have sex.

 

First however, they had a problem.

 

As any of the aforementioned spoilsport physicists will no doubt tell you, merely opening the doorway is not nearly enough. Infinity, as it turns out, is a very big place and most universes contain nothing that could be described as anything, let alone anything that might be described as interesting. If their new invention was to be useful for anything other than garbage disposal or organised crime, a means would have to be developed to sort the wheat from the cosmic chaff and indentify the incomprehensibly rare universes of narrative significance.

 

Horatio, our protagonist, was employed to resolve this problem, and he threw himself into the task with single minded intensity for, as you may already have guessed, Horatio had seen a sudden and unexpected opportunity.

 

It was an extraordinarily difficult problem, the sort of trial that is, inconveniently, overcome only by dint of many years of work, most of which involved very complicated sums and so, whilst I hate to disappoint the numerical masochists among my readership, I will skip the specifics. Suffice it to say Horatio eventually succeeded in patenting a process by which the staggering number of universes could be quickly and efficiently investigated and categorised, in the process ensuring his own unfettered access to the vast infinite reaches of potentiality.

 

Humanity had, after a brief spate of hurried press conferences, collectively rejoiced. Here at last was the means to end all wars, suffering, and tiresome weekly recycling responsibilities. Most importantly of all, here too was an opportunity to have lots and lots of sex.

 

Thousands of technicians set to work investigating and cataloguing dimensions full of interesting resources such as precious metals, oil, food, unpublished Shakespeare and J K Rowling folios, and “not unattractive young women inexplicably in need of a single virile male to repopulate their entire race”.

 

Horatio was looking for none of the above, Horatio was, as should now be blindingly obvious, was looking for dragons.

 

As most of those tiresome physicists would be able to speculate, there is a tremendous gulf in the probabilities involved between finding something of interest and finding something specific but even with this consideration in mind Horatio’s search took him far longer than he had anticipated. Nevertheless, after almost half a decade of tireless work, he suddenly and unexpectedly awoke to the insistent beeping noise telling him that his inter-dimensional search engine array had found him something that matched all the extensive specifications he had given them.

 

Horatio approached this finding with the quiet but jittery kind of calm that can only be achieved by transcending excitement; he spent several months carefully preparing for his expedition. After all, he had waited his entire life to see a dragon, he could wait a little longer if it meant he was properly prepared. No consideration went unconsidered, and no expense was spared, and that expense was considerable, but this was of little object because as you might expect Horatio’s discoveries had made him rich beyond easy reckoning.

 

Tragically however, all his efforts seemed in vain. When, finally, he stepped through the portal into the cold and unremarkable landscape, of what would otherwise have been a singularly uninteresting world, there was nothing to be seen, no majestic draconian wildlife for him to observe. All that could be found, after an extensive search, was the single gargantuan decomposing corpse of what may, or may not, have been a dragon.

 

Horatio was distraught but not discouraged, evidently dragons had been present, and obviously some freak happenstance had wiped them out while he had been preparing for his grand expedition. If he could find dragons once, then clearly he could find dragons again, and so, freshly resolute, he set back to work.

 

All this time human society progressed largely as before, with only the minor side effects of free extra dimensional travel to disturb the usual human preoccupation with war, greed, and more war. The abundance of anti-matter universes had provided cheap energy for all, as well as sharply increasing the general public’s interest in the non-local distribution of power stations.

 

There was the massive increase in emigration, all sorts of exciting new kinds of immigration for people to get upset about, the quite endearingly inept pan dimensional invasion of the treacle people from candy dimension 2672 and the subsequent bankruptcy of Tate and Lyle in the wake of the resultant drop in the wholesale price of sugar.

 

Most of these events were far less interesting than you might imagine though with the exception of the latter which, by a curious quirk of fate, was very nearly precisely as interesting as it sounded.

 

The world government had, of course, fallen to militant feminism shortly after Horatio’s great invention was first unveiled. It had all started when the women of the world had collectively complained that some of the men were using the device in ways that could possibly construed as being quite horribly sexist. The men had replied that they were sure that they could find the women plenty of dimensions full of flowers, clothing, ponies, and other girly stuff, but this had not in fact gone down terribly well and many of the men of the world had been collectively shouted at.

 

The men of the world went away and thought for a little while but not, alas, for quite long enough, and a few too many of them had then quite innocently suggested that if the women were concerned about inequality, they were quite sure that it would be straightforward to locate an abundance of dimensions full of “not unattractive young men, inexplicably in need of a single fertile female to repopulate their entire race”.

 

The coup had been relatively quick and bloodless as such things go and all things considered had probably been for the best, made far less difference to the general scheme of things than either gender would have been prepared to admit beforehand, with only the odd minor quirk, such as the fact that the trains started to run on time.

 

But I digress.

 

It took Horatio another ten years to find another dragon but, this time, he was ready. This time he didn’t wait calmly to organise an expedition, he didn’t don his custom made asbestos suit or pause to clutch his jumbo strength tranquilliser gun, this time he ran across the multidimensional control room and straight through the nearest portal.  He emerged through the portal just in time to see a grand silhouette stagger erratically through the air, crash into ground some distance away, and expire in a messy, and well distributed, fashion.

 

“Egad” exclaimed Horatio, for it is impossible to do something quite as sadistic as to name a child Horatio Pemblethwaite without it having some lingering effect on them.   Horatio had finally realised what was happening to his dragons.

 

Obviously the majority of my readership, the smart ones, will be well ahead of Horatio at this point and there is no real need to clarify things, but I suppose I can spell things out explicitly for the incorrigibly slow. Horatio had realised just how intrinsically improbable dragons actually are. He had realised that for a dragon to fly it must be buoyed up by millions of air molecules, that, for it to breathe flame, a million molecules worth of dragon must collectively decide not to catch on fire. In order for dragons to exist in any dimension which the laws of physics would actually allow Horatio to visit, these preposterous impossibilities had to occur every time any single dragon actually did anything.

 

While our friendly contemporary physicist would tell you that they couldn’t even begin to quantify how ridiculously and stupendously unlikely that this would be, they would also have to grudgingly concede that logically a nearly infinite number of such dimensions would still exist. They would also point out with a trace of satisfaction that whilst effectively infinite in number and constantly appearing, the laws of reason would ruthlessly expunge the vast majority of these errant dimensions of dragons on a moment to moment basis in a constant churning genocide of possibilities.

 

So it was not just that Horatio’s objective required a dragon to exist in the specified dimension, a dragon, let us not forget, which had evolved and live out its life entirely against the current of chance. He had also to accept that, as soon as his dragon hunting equipment had detected, the great universal dice would roll again and his newly discovered dragons would, with overwhelming certainty, become extinct. In order for Horatio to find his dragon, he not only had to find a dimension where dragons existed, he had to find a dimension in which they would continue to exist, at least for the foreseeable future.

 

This is the point at which the pedants among my readership, you know who you are, might feel inclined to object. They would probably begin by complaining that the degree of improbability involved in finding even a single dragon renders this whole tale completely implausible. They would be wrong of course and I can forgive their ignorance. They would then, no doubt, continue to suggest, their voices brimming with whining nasal smugness, that the only way of predicting the future of any given dimension, whilst remaining comfortably in the present, requires charting the course and state of every single atom within it. This, they would contend, would be very nearly totally impossible, and on this single, but significant, point they would be correct.

 

Horatio realised this almost immediately, his calculations soon told him that running the calculations required, even the limited extent that he required, would require a computer the size of a galaxy, of complexity orders of magnitude greater than any that have ever, or will ever, exist in this universe.

It took Horatio’s machine a quarter of a century to find one.

 

The air was crisp and cold on his skin as he stepped through the threshold; he ignored the nagging pain in his limbs and squinted against the light of the sun as it rose over an unfamiliar horizon. Above him shapes danced through the air as the dragons sang in a hauntingly beautiful, and utterly alien, song.

 

That song quickly became a lament, however, as the shapes began to stagger in the air and then spin slowly and gracelessly towards the ground.

 

If Horatio had observed this sight, he would most likely to have been struck by a final terrible realisation, that his very presence on this world would disturb the fragile cobwebs of whimsy which would have kept his beast’s alive, that this was not a factor that he could correct for and avoid, that he had in fact brought premature destruction to the very wonders he so appreciated, that he could never enjoy the living company of his icons, and that any further attempt to do so would result only in further disappointment, genocide, and possible canonization on account of dedicated service in the field of dragon slaying.

 

This final truth would most likely have destroyed him utterly and, so, it was probably a small mercy that he was distracted by the shadow descending over him, in the final span of seconds before he was crushed utterly by a confused and rapidly expiring dragon.

 

You could speculate dear reader, that this is a morality play, a cautionary tale, meant to impress upon you the dangers of pursuing a hollow dream at the expense of the other things that life may promise. This may indeed be the case, although it is perhaps worth noting that reality seldom provides a moral to its stories more worthy than that evil often prevails, greed can provide its own rewards, and that few good deeds go completely unpunished.

 

It is also possible that this is nothing of the sort, that this tale ignored the normal and average components of Horatio’s life. It is possible that the tragedy of the story is projected by you the reader and that our Horatio in fact lived a relatively long and happy life filled with purpose, unrelated hobbies, and happy smiling grandchildren, that you have conjured a tortured and empty existence to meet your own twisted expectations of truth.

 

It is would be possible for the most cynical of readers to realise the necessary existence of an infinity of Horatio’s, the lives of whom, even when constrained to fit precisely within the details of this narrative, plot a wide and bizarre plot on some universal tragedy curve, rendering any scientific attempt to extract trite revelation a futile endeavour for those unwilling to spend an eternity studying exceedingly large and complicated graphs.

 

Whilst the statisticians amongst you are thus engaged, the more sentimental of you can therefore be reassured, that in at least one universe of an unfathomable infinity, as at least one Horatio looked up for the final time, that the expression that crossed his elderly face was not in fact terror, horrified comprehension, or even resignation. That it was one of fulfilment and contentment and true happiness, that rarest of all emotions, and that in his final seconds he spread his arms wide open as he prepared to embrace in a tangle of limbs, scales and claws, his very first dragon.

One chapter reviews – John dies at the end – chapter 14

 

A new series of articles, highlighting outstanding individual chapters from the books that I’ve read.

Starting off with – John dies at the end, by David Wong

 

Cover © Macmillan

Chapter 14 – John Investigates

“Two steps in, John found himself standing on a faded pink stain on the snow, as wide as a car. He deduced that this was blood, though the truck driver’s body was gone. He stood over this large bloodstain and said, out loud and in the presence of several by-standers, “This is blood! David must have been here””

There’s the unreliable narrator, and then there’s this.

This chapter sees the protagonist relate, with some justifiable skepticism, a parallel series of events as described to him by the titular John. At this point in the story we have already established that John is a pathological liar, a borderline sociopath, and, frankly, not terribly bright. Nevertheless, the book’s main narrative is one of bizarre, reality warping, horror.

So not only does the author get to have tremendous fun writing a riotously funny, Munchausen style, mini narrative, he still gets to keep most of it credible within the framework of the book, something that can continue to pay off within the rest of the story.

This is incredibly funny writing, and magnificent story construction, especially as it’s interweaved to soften some of the heavier components of the main story.

 

The Rest of the book –

This is not a very mature book, at least, not on first examination. There is plenty here that will offend some of my readers.

But it also has to be said that it’s very genuinely immature. This is the voice of the twenty to thirty year old, with nothing apparent to aim for, and with no obvious reason to engage their intellect. It is a voice that, distressingly, is becoming more and more common online. But it’s one that very rarely gets as widely heard as it should, mainly, it has to be said, because of the politically incorrect epithets and the bodily fluids, and the references to genitalia.

Anyone who is already familiar with this book is probably about to accuse me of over sensitivity here. And in the specific context and market in which it has already been incredibly successful, they are probably right. I’m sure the author is happy with the book’s success, and comfortable with the audience that he does have. But this is a book that deserves a wider audience, and I’d like to think that at least some of that audience deserves it.

This is an assured story, especially for a first effort. It’s clever and observant. It can transition effortlessly and instantly between real humor and genuine horror in both directions. It makes a lot of very valid observations, often about things that don’t get talked about nearly enough. It is painfully honest.

It isn’t a perfect book. It loses steam towards the end, and never really delivers on some of the conceptual promises delivered in the blurb. But I’ve never read a perfect book.

You will not get many opportunities to read a story like this, in a voice like this, delivered so assuredly. Frankly, it’s not a story that could survive being sanitized, even if the author was inclined to try.

I’m not trying to sound all literary here; I wouldn’t know where to start. All I’m trying to say is, if you are reading this review, and you sound like the kind of person who would never read a book like this, then this is a good opportunity to try something different.

Stats 5 – Power to the people, why large groups are just better

Welcome back to An astonishingly useful guide to data analysis for people that don’t like maths in last week’s exciting episode you learned how to evaluate your data before you start doing maths with it. This week I’m sure you’ll be thrilled to discover that I’m going to be introducing you to some more important concepts.

This is because they are important.

 

Random facts –

Random data is random. People forget this, a lot.

This is especially important when you are using Mean values, simply because nothing about a sample mean offers any useful information about its reliability as an estimation of the population mean. It’s very easy for them to mislead.

 

The above chart illustrates the mean values of 5 separate groups of data. The data in each set was comprised of ten entirely random numbers between 1 and 10. The final bar indicates the mean of all 50 values.

In this case you can treat the population mean as 5, and each set as a separate sample group the mean of which is obviously the sample mean. The final bar treats all 5 sets as one single group of 50 values, and indicates the sample mean for that case.

As you can see, even with ten data-points in a group, some of the the sample mean values differ substantially from the expected population mean of 5.

Conventional wisdom suggests a minimum of three samples in each group before you start to work with data, but even with 10 samples a deviation of up to 20% from the expected sample means is apparent. If we treat all 50 values as a single sample, the sample mean almost exactly matches the population mean.

 

This is another set of mean values. There are 20 values here, representing the deviation from the expected population mean (50), of the sample mean of a set of 20 groups, each containing between 5 and 100 values. Each individual value was randomly generated between 1 and 100.

In other words, the graph does not show the sample means, it shows the extent to which they deviate from the expected population mean.

There are a couple of important things to take on board here.

  • A small group is not always associated with a high deviation.
  • The larger the group is, the less likely it is that the results will be skewed significantly by chance.

It is important to realise that it is still possible for a large group of  samples to return a mean that deviates substantially from the true population mean, it just becomes less likely as the sample size is increased. The issue is one of variance, which is one of the next concepts that I’m going to discuss. But before I do that we need to discuss something else.

 

Striving to be normal

You may have noticed that the examples of random data that I gave in the previous example seemed to result in more variability in the sample means than seemed intuitive. This is because of how it was distributed.

It’s probably worth taking the time here to be clear on the difference between random distribution and random generation. Randomly generated numbers are those produced as a result of an ostensibly random process, in this case Excel’s number generation function. The randomly generated values that I showed you before, are of uniform distribution, that is to say, the numbers have an equal chance to fall anywhere in the specified range.

One of the reasons that people are so bad at dealing with randomness is that it is not a common occurrence in real life. In reality, data tends to cluster around the sample mean. In fact, a lot of statistical procedures assume that data will follow what is called a normal distribution.

In the above chart each line of points represents a separate set of randomly generated data. The first line consists of 50 values that are uniformly distributed, the second, 50 normally distributed values. You can see that the first set spread out fairly evenly between 1-100, whereas the second set tend to cluster nearer the expected mean of 50, although they still represent a considerable range of values.

The pattern is not as neat as you might expect, because the values are still generated randomly. Which means that any pattern of data could have occurred. Excel could have delivered me 50 identical values of 100, for example, in both cases, it’s just (very) unlikely.

Most statistics programs will inform you if your data is not normally distributed, although they will often allow you to proceed anyway.

You need to be aware if your data is not normal. It may prevent the test from working properly, but it may also reveal problems with your data that you are not aware of.

 

Possible reasons why your data does not follow a normal distribution –

  • There is a genuine and reasonable expectation that data would be randomly distributed – For example, you are looking for evidence of bias in what should be a random system such as a roulette wheel.
  • Subgroups within your data – If your data shows distinct clusters of normality it suggests that the factor that you are investigating is not consistent across the sample group or target population. It might also imply that the data was collected or processed inconsistently or conglomerated from separate sets.
  • Exponential growth – A good example of this would be the way a story or joke is shared by Twitter users. Because each retweet increases the pool of people who can make further re-tweets, the most successful Twitter posts will tend to be dramatically more shared than the least. You can expect to see this pattern a lot with social media data.
  • Fraud – You need to be careful here, but it is not uncommon for people who are falsifying data, but who do not have an in depth knowledge of statistics to do this by generating a uniformly random number series within a specified range, rather than a normally distributed one. You should be suspicious of truly uniform distributions within a large set or subset of numbers unless there is a very good explanation. The scarcity of uniform distributions within nature means that, with large groups of data, it is difficult for them to happen by accident or as a result of innocent mistakes that don’t happen to involve random number generators. but…
  • It can still happen by chance – As previously stated, you will never be able to be absolutely certain about data unless you have all of it, and if you have all of it, you don’t need to make predictions about it.

Statistics  = Confidence not certainty

You can never get away from chance when you are working with samples. Statistics does not allow you to make definitive statements about what is happening, but it allows you to determine how confident that you can be in your predictions.

“My data is not normally distributed, what do I do?

Start by figuring out why your data isn’t normally distributed.

If your data is showing clusters of data, the best approach is going to involve trying to untangle the subgroups from each other, obviously this may reduce the sample size below the level required for good quality data.

In some cases, especially with data associated with exponential growth, data that isn’t normally distributed can be “transformed” to meet a normal distribution, however this is very context specific.

 

Variance

As you should now realise, the problems associated with small data-sets are a function of their increased exposure to chance.

You should also understand that representing data from a group of samples, just by indicating the mean values,  provides no information about the level of variability within that group.

Variance describes the extent to which individual values fall close to the sample mean.

How random happens

With the examples of random numbers given above, the numbers come from a machine, and because machines suck at random, these will result from a natural process of some type, which while not random either, will in turn be influenced by other processes, and so on until we have split enough hairs that quantum processes, that may actually be random, are involved, and at any rate it’s long since become impossible to keep track.

Remember that variance doesn’t spring into existence just because the universe hates you (although that’s my go-to explanation for a lot of other stuff). It come from those aforementioned external influence, which we call factors.

Generally if you are doing statistics, it’s because you wish to investigate one or more of these factors, but trying to do this doesn’t make all the other factors go away and stop influencing your samples, because even if the universe doesn’t hate you, it’s not about to start doing you any favours

 

Fitting it together

So..

Normality describes the pattern that data most often falls in, and variance describes how precisely this actually occured. The variance is determined by all of the factors acting on your sample values other than the one that you using to define your target population.

Considering the assumptions that you have made will often allow you to identify some of the most important factors ahead of time, understand their interaction with the one you wish to investigate and help you to assemble sample data that is more representative of the target population.

Variance reduces your ability to make accurate predictions about the target population from your sample values, and this in turn decreases your confidence in those predictions.

If you can’t identify or correct for the influence of additional factors your data, you can oppose the resulting variance by increasing the number of samples that you take.

 

Next time on Stats – Practical stuff will happen

So how do we indicate variance and quantify confidence?

Come back for the next article in this series, in which I will finally start to talk about the actual process of data manipulation, starting with calculation and appropriate use of standard deviation and standard error.

WordPress Themes