Wednesday, November 26, 2008

Binning Algorithms for Metagenomic Sequencing

One of my assistants, SM, who is at least as smart as me and twice as hard working, wrote to ask my advice.

SM: Do you know of any good binning algorithms for metagenomic sequencing?
DL: Huh? What does, "binning algorithms for metagenomic sequencing," mean?

SM has not given me an answer. Either she assumes I am joking, and actually do know (which I don't) or she assumes it would take far too long to explain it to me (which I will pretend to resent.) So now I shall try to reckon out what "binning algorithms for metagenomic sequencing" means on my own.

Metagenomics, according to my sources (Wikipedia) "is the study of genetic material recovered directly from environmental samples." So, you take a pinch of garden dirt, extract all the DNA in it and then set out to study it in some way. You are metagenomisizing.

Sequencing, in the context of genetics, means figuring out the sequence of DNA bases (A's, T's, G's and C's) that make up part of the genome of an organism. So metagenomic sequencing presumably is taking the DNA from your pinch of dirt, then trying to figure out the sequence of DNA bases that made up all the genomes of all the organisms whose DNA are jumbled together in that dirt. A pinch of dirt, I am guessing, has DNA from hundreds of types of bacteria, a huge number of types of fungi, various protozoans and whatever else has dropped seeds, pollen, poo, tissue or hair in that vicinity in the recent past. And much of that DNA isn't going to be whole chromosomes, but whatever bits and pieces are still mostly intact after all that pooing and shedding and biodegrading. You'll have a real mishmash.

This, I suspect, is where the "binning algorithm" comes in. Binning is any process where you have a large number of elements and you want to separate them into a smaller number of categories. A binning algorithm would be a set of rules one uses to make those decisions on categorization. In the context of metagenomics, I'm guessing that each bin represents a species. You have a snippet of DNA and you need to assign it to an organism, so you don't just think that every bit of DNA is another organism, and you want to get a sense of how much representation you have of each species. So the set of rules you use to assign snippets of DNA extracted from your pinch of dirt to different species is your Binning Algorithms for Metagenomic Sequencing. I think.
My friend DS works on this kind of stuff. I'll write to him and ask.

UPDATE:
I wrote to SM and DS and asked:
Will one of you tell me what "binning algorithms for metagenomic sequencing" means?
I know what each word means, but I could come up with three or four very different guesses as to what the whole phrase means. What does each bin represent?

DS writes: [Bins represent] Taxa. In metagenomic sequencing, you get a soup of reads from all the strains of microbes present in your sample. "Binning" is the process of trying to guess which species each read comes from (or genus, or kingdom for that matter).

All methods in the literature so far are "supervised", meaning that you can only assign a read to a taxon bin if you know something about that taxon in advance (e.g., you have an isolate genome). However, environmental samples may contain previously unknown taxa: new bacterial divisions are still being discovered fairly rapidly, and at the strain level of course nearly everything is novel. A supervised binning process ought to throw up its hands at sequences from novel taxa, since they don't match any known bins. An "unsupervised" process would create new bins on the fly, in order to lump together reads that seem to be related to each other, independent of reference sequences. No published methods do that yet, though.

The accuracy of binning varies dramatically depending on the complexity of the community, the read length, the phylogenetic resolution you're asking for, and many other parameters.

Hope this helps,

-ds

Tuesday, November 25, 2008

Demographics of Science!

African Americans are generally underrepresented, both in the universities, and in the sciences. Berkeley is no exception in this case.

During my time in grad school I have interviewed well over 100 undergraduates who were applying to work with me, and taken on (as volunteers or paid workers) about 30 of them. Currently, I have 18 undergraduate collaborators. I've not given a great deal of thought to the demographics of this group, other than to notice that the great majority of my applicants (and therefore of my assistants) are female. A recent conversation (about Pres. Elect Obama) made me stop and think about the race and religion of this group. It is a very diverse group. I have had assistants who are Christian, Jewish, Hindui, Muslim and non-religious. Maybe other religions, I don't know. I have had assistants whose ancestors (or they themselves) came from East Asia, South Asia, the Middle East, Eastern Europe, Western Europe, Pacific Islands, Latin America and possibly other places I am not aware of. They have been male and female, heterosexual and homosexual. There are few places in the world where I could have ended up with a more diverse group, but I have no one of obvious African decent.

African Americans are not represented in my lab for a simple but sad reason. I have had not one African American applicant (that I am aware of), out of maybe 120. It is striking that African American representation in this group is lower than among our nation's elected officials. I am not sure why exactly this is, what combination of bias, cultural factors and public policies to blame, but I know this is one area where African Americans don't yet seem to have made sufficient inroads.

Saturday, November 22, 2008

Reader JTE asks:

Q:What does

two individuals with the same genotypes, except for those genes determining sex (which is some species don't exist, where sex is environmentally determined),

mean?

A: I'm glad you asked.

It means that if I had one missing or dysfunctional gene on my Y chromosome (or was XX instead of XY), I would be phenotypically female, but the rest of my genome would be the same as it is now. A great many aspect of my physical, chemical, social and mental being (my phenotype) have been altered by the effects of this one gene, which acts as a sex switch. Switch on maleness, and a whole bunch of aspects of phenotype are altered. Don't switch it on, and you get a different phenotype.

In some species, there are no X and Y chromosomes, or anything equivalent, to act as a sex switch. Instead, whether an individual develops as a male or a female is determined by the environmental conditions which prevail at a certain point in development. In alligators for example, there is no genetic determination of sex. Instead, if the temperature around the egg is above a certain temperature at a certain point in development, the alligator becomes one sex (I think male, but I don't actually remember). If it is ?colder? than that temperature, you get a female alligator. Many of the aspects of the switch are the same, only the first step of the switch is very different.

So my colleague was pondering the fact that two individuals with similar, or even identical, genotypes can have importantly different phenotypes, based on the action of this switch. This means that whether this switch is on or off can greatly affect the actions of other genes, and therefore the effects those other genes have on the survival and reproductive success of the organism.

Thursday, November 20, 2008

Intersexual Correlation

A colleague wrote to ask me what I thought about an idea he'd had. He was thinking about the fact that one could have two individuals with the same genotypes, except for those genes determining sex (which is some species don't exist, where sex is environmentally determined), and end up with significantly different phenotypes. In some traits (e.g. Hair color) these two individuals would be expected to have very similar traits, in others (e.g. genital morphology) they would be expected to be very different, and perhaps in some cases uncorrelated or negatively correlated. He wondered if this might affect the ability of individuals to choose mates who would produce highly successful offspring. For instance, a female sizing up a male would have a better sense of what that male's sons would look like than what his daughters would look like. A big very masculine male might tend to have oversized and somewhat unattractive daughters. My colleage wondered if this might confuse things enough to slow down the action of sexual selection, and allow a greater genetic diversity to remain in the population than would otherwise be the case. I found htis a very interesting question, and wrote the following reply:

There is a body of literature on the degree to which natural selection on the traits of one sex will affect the traits of the other sex. People often use the term "correlated evolution" to describe this sort of thing. When there is a correlation (positive or negative) in a trait between the female expressed genotype and the male expressed genotype, I've seen the phrase "intersexual correlation." I am not terribly familiar with this literature, I'm afraid.

This recent paper is the closest thing I know of to what you are talking about.

Whether any of this would lead to a greater genetic diversity in the population, I am not sure. The effects of natural selection may be somewhat weaker, as traits that are expressed in one sex but not the other are less often expressed, and therefore less often subject to selection (an epistatic interaction in effect). In the case of sexual selection, my guess would be that as long as degrees of intersexual correlation in particular traits evolve more slowly than do what cues individuals use to choose mates, choosers should evolve to focus on characteristics that are good indicators of fitness in both male and female offspring. I think this will generally be the case, as there is clearly very strong selection against those who use misleading cues in mate choice. I am not aware of any reason to think there would be rapid change in the degree of intersexual correlation in a wide range of traits all at once. As long as there is any consistently reliable signal available, the family lines that use it should tend to do better than the population average.

It raises an interesting set of questions, I am not sure how many of them there is any literature on.

Does this answer your question? If you want more expert answers we could ask Monty Slatkin, who I am sure has thought about this in some detail at some point.

Tuesday, November 18, 2008

Misquotes of Science!

Science is the least precise way of describing the world, except for all those others that have been tried.

Sunday, November 16, 2008

Cannabalism, entropy, economics and consumerism.

Among the millions of species of organisms out there, you can find a species that specializes in eating almost anything. There are lion-poo specialists, feather-barb specialists, lichen specialist, you-name-it specialists. There are also lots of generalist species, and many of these generalists engage in cannibalism. But no species is a cannibalism specialist. I can say this with confidence, even though we don't know what most species eat in any detail. A species of dedicated cannibals would quickly run out of energy. Everything an organism does burns energy that cannot be retrieved. Thermodynamics and all that. If a population is to withstand the ravages of entropy for any time at all, there has to be a sizable inflow of concentrated nutrients and well-ordered energy. A population of cannibals has outflows but no inflows, and quickly changes food supply or goes extinct. Engaging in cannibalism can be beneficial for short periods under specific circumstance, but you can't eat all conspecifics all the time. That way lays rapid extinction. Similarly, an ecosystem cannot persist for any period of time without massive inward fluxes of organized energy. Sunlight, geothermal chemicals or organic detritus from one of these two are necessary inputs to every ecosystem we have ever come across. Without that, organized energy in the system quickly declines until life can no longer be supported.

This same logic applies in economics. Pyramid schemes and speculative bubbles ultimately must collapse, because there is no underlying production of valuable stuff to support the outflows of capital of those involved in the speculation. Extending the analogy only slightly further, we see why a "consumer services based economy" cannot long persist. We import the carpet-cleaning machine from China, but we cannot export the carpet-cleaning service. We import the yoga mats but can't export the yoga lessons. We bring in coffee beans but can't export the latte. The consumer services part of the US economy (the biggest part) sends money out but brings effectively no money in, feeding instead on money that is already in the system.

The fact that our consumer economy lasted as long as it did/has is a testament to why economics is a social science, rather than a natural one. Humans are inherently illogical, and economics needs an understanding of that as much as it needs equations to understand the ways in which we are logical. Economic theory worked out for any other species would perform terribly for humans, meaning economics is by necessity anthropocentric, and therefore a social science. This has allowed an economy with few inflows to persist by inventing imaginary inflows, known as international borrowing. Americans may not be able to give you anything back for your stuff, but if you lend us the money to buy it from you, we will promise that at some point in the future we will borrow more money from someone else to pay you back with interest. Stated this way, it is an obvious pyramid scheme. But we have preferred to think that because our economy is large, because it has been dynamic, we soon would no longer need to borrow. Instead, I hope, we have figured out that we have to consume at a level closer to the level at which we produce. Otherwise we are just eating our children's future earnings, which is a bit too close to cannibalism for my taste.

Saturday, November 15, 2008

Tough Love

A few months back, one of my first and best lab assistants, LZ, was graduating. We were at a ceremony/lunch for her and the other students who had received an undergraduate research fellowship.
I said to her, "now that you are graduating, I want honest feedback on how I can improve as a mentor, and what things I should think about changing." She copped out, going into a long list of all the things I do right, then asking me what things I thought I needed to work on. She's a clever one, if overly tactful.

I said, "that's a total cop-out answer." She persisted in answering without answering, and in pushing me to answer my own question, so I did.

I told her that there are two main things I feel I really needed to figure out better. First was the balance between autonomy (allowing students to do what they want in their own projects, even if it might not work) and direction (giving students a project that is very likely to work, even if it is not exactly what they want to do). Second, I thought I was pretty good at picking good students, and at mentoring good students, but not so good at knowing what to do about the disinterested students who I mistakenly hired and couldn't really motivate. I tend to assume everyone on my team is competent, interested and motivated, and when any of these assumptions is violated, it takes me a while to convince myself that there is little doubt to give the benefit of, and a longer while to figure out what to do about it. In typical LZ fashion, she consented without actually stating agreement.

Recently, I have been trying to tackle the second problem, approaching students who I didn't feel were getting it done and letting them know where I thought they needed to improve. The results so far have been quite positive, and I am hopeful that despite LZ's concerted effort to be unhelpful, my conversation with her has helped me improve my mentoring.

So there.

Thursday, November 13, 2008

On a related note:

University of California's endowment loses $1Billion in value.

"UC Berkeley spokesman Dan Mogulof said that if the financial markets continue their downward slide in coming years, there could be future reduction in endowment support for scholarships, research and funding to recruit and retain faculty, among other things."

Graduating into a Depression

The number of professorships in the country does not vary much from year to year. Usually the number of positions opening up approximately equals the number of professors dying, retiring or moving to other jobs. Occasionally a new campus opens or there is a major expansion of enrollment, and a bunch of new positions are created. Other times the economy sucks (to use the technical term) and as a cost-cutting measure positions are retired or left vacant for a few years. As an example, the budget problems California has been having ever since the dot com bust have caused UC Berkeley to greatly increase the average time between one professor leaving and another being hired.
Right now, professorships are hard to get and I fully expect them to get harder. Somebody or other, a housing economist I think, was on NPR today predicting that the housing market will hit bottom in another three years. I expect the academic job market to hit bottom around the same time, or possibly a year or two later.

Which brings me to me. I will be getting my PhD in a little under a year. I then expect to spend two or three years as a Post-Doctoral researcher in Germany. That should have me searching for an assistant professorship just about the time there are no professorships of any sort to be had.

President-Elect Obama, I think an investment in our nation's universities and research institutes would be a great way to stimulate the economy, improve education and develop the technologies we need to deal with environmental issues and health care. Sooner is better than later. I have this three-year plan.

Monday, November 10, 2008

Team of Science

We attempted to get me and my entire team of undergraduate rotifer wranglers into our tiny lab space all at once. Two people couldn't make it, but 12 of us plus a photographer jammed in. The room is 12m^2 but about half of space that is occupied with counters, furniture and large equipment. Hopefully at my next job I will have a larger lab space, a smaller team, or both.

Sunday, November 09, 2008

Population Doubling

As I am preparing for my talk, I am doing some intense demographic analysis of my rotifer data set. One interesting factoid I have calculated is that the population doubling time, assuming I could keep an infinite number of rotifers and didn't have to get rid of any, is 28 hours.

A related calculation: If I started with one newly hatched rotifer and let the population grow (with my average age-specific reproductive rates and death rates), after one month I would have 159 million rotifers.

The average volume of a rotifer is about .001 cubic millimeters. A million of them pressed together makes one milliliter. A billion makes a liter. 10^27 would be a cubic kilometer. Earth's oceans have a total volume of 1.347*10^9 cu km, meaning I would need 1.347*10^36 rotifers to fill them completely with no space between rotifers. At the demographic rates they maintain in my lab, assuming I didn't cull any, this would take 138 days.

I only have time, container space and staff to keep track of 450 rotifers at a time, so I end up culling a significant portion of my population every day.

Rotifer Demography Talk Wednesday

I'm a biology grad student, but my funding and my fellowship are all through the Demography department. One service I return to the Demography department is to attend their weekly seminar and who ever the speaker is, suggest biological literature relevant to her topic of study. Some demographers take better to this than others. Most seem to appreciate the new perspective, even if they are not really interested in thinking about humans in a biological context. (For the record, I also go to biology talks and bring up demographic concerns.)

The next speaker I will have to deal with differently, because the speaker this coming week is me. I'll be presenting on demographic aspects of my rotifer research. Age specific mortality and reproduction. Effect of food supply on longevity. Infant mortality. I'll get into the biology a bit too, but mostly they'll want to hear about the demographics. If I was in my audience, I would suggest more of a focus on the biology.

Wednesday, November 05, 2008

Whose next?

Of the thirteen students working with me in on rotifer work, only one is white Christian heterosexual male. This never occurred to me before yesterday, when we were sitting around the lab, talking about the fact that should Obama win, he would be the first POTUS who was not a straight white Christian man.

I asked my students if they thought, now that we were getting a non-white president, we would have a female president, a non-Christian president or a homosexual president first. Some said we would have a woman soon, others said America would elect a Jewish president before a woman. Everyone agreed that there is still too much bias against homosexuals to have an openly gay president any time soon. I asked them if they thought Americans would ever elect a scientist as president. They all said no, and a couple of them said that was probably a good thing.

Three of my students a naturalized citizens, and therefore are barred by our constitution from running for president. But the other ten, in my opinion, should all have equal shots at the White House. The fact that they are all science students studying evolution at Berkeley means that this chance is zero is bearable, so long as it is an equal and unbiased zero.