Red Run fecal bacteria
Approximately 50% of fecal bacteria are Human and
the other 50% is waterfowl, birds, dogs, cats, etc.
Sequence-Based Source Tracking of Escherichia coli
Based on Genetic Diversity of β-Glucuronidase
Jeffrey L. Ram , Raquel P. Ritchiea, Jianwen Fangb,
Felicitas S. Gonzalesa and James P. Selegeanc
High levels of fecal bacteria are a concern for recreational waters; however, the source of contamination is often unknown. This study investigated whether direct sequencing of a bacterial gene could be utilized for detecting genetic differences between bacterial strains for microbial source tracking. A 525-nucleotide segment of the gene for β-glucuronidase (uidA) was sequenced in 941 Escherichia coli isolates from the Clinton River–Lake St. Clair watershed, 182 E. coli isolates from human and animal feces, and 34 E. coli isolates from a combined sewer. Environmental isolates exhibited 114 alleles in 11 groups on a genetic tree. Frequency of strains from different genetic groups differed significantly (p < 0.03) between upstream reaches (Bear Creek–Red Run), downstream reaches, and Lake St. Clair beaches. Fecal E. coli uidA sequences exhibited 81 alleles that overlapped with the environmental set. An algorithm to assign alleles to different host sources averaged approximately 75% correct classification with the fecal data set. Using the same algorithm, the percent of environmental isolates assignable to humans decreased significantly between Bear Creek–Red Run (30 ± 3%) and the beaches (17 ± 2%) (p < 0.05). Birds accounted for approximately 50% of assignable environmental isolates. For combined sewer isolates, the same algorithm assigned 51% to humans. These experiments demonstrate differences in the frequency of different E. coli strains at different locations in a watershed, and provide a “proof in principle” that sequence-based data can be used for microbial source tracking.
PCR, polymerase chain reaction;
uidA, gene for β-glucuronidase
High levels of waterborne fecal bacteria are associated with increased risk of disease for recreational swimmers (Cabelli, 1977; Dufour, 1984; USEPA, 1986; Prüss, 1998). Because of its association with fecal matter, levels of E. coli are the key regulatory measure of the healthfulness of freshwater streams and lakes. For example, in Michigan, beach water showing more than 300 E. coli colony forming units (cfu) per 100 mL on a single day is considered out of compliance (Michigan Department of Environmental Quality, 1999), resulting in frequent beach closures in southeastern Michigan beaches (Macomb County Health Department, 2000). Furthermore, USEPA has estimated that at least 40000 km of streams and coastal waters nationally have levels of bacteria that exceed health standards (Malakoff, 2002).
Recently, genetic analysis techniques have been developed to identify sources of environmental bacteria. As described at a recent USEPA-sponsored conference on microbial source tracking (USEPA, 2002), these techniques include analysis of repetitive-DNA fragment lengths (Dombek et al., 2000) and genetic analysis using restriction enzymes, particularly focusing on sequences in or near 16S ribosomal DNA sequences (ribotype analysis) (Carson et al., 2001; Samadpour, 2002). Although these previous techniques depend on genetic differences between E. coli strains, none have been based on direct knowledge of the specific sequence differences between the strains. The present study describes a new technique based on direct analysis of sequences at a particular genetic locus, the gene for β-glucuronidase.
β-Glucuronidase was chosen for this study because of its frequent use in detecting E. coli and because previous studies had indicated that β-glucuronidase has a substantial amount of genetic variation. β-Glucuronidase activity is observed in approximately 95% of naturally occurring E. coli (Martins et al., 1993) and forms the basis of the Colilert, mColiBlue, and modified mTEC methods of enumerating E. coli (USEPA, 2001). The entire 1812-nucleotide gene for β-glucuronidase, known as uidA or gusA (Monday et al., 2001), has previously been sequenced in strain K12 (Blattner et al., 1997), four pathogenic strains of E. coli (Hayashi et al., 2001; Monday et al., 2001; Perna et al., 2001; Welch et al., 2002), and two strains of the closely related Shigella flexneri (Jin et al., 2002; Wei et al., 2003). Eleven variants of a 126-nucleotide region of the gene in 50 isolates have been described (Farnleitner et al., 2000). Some pathogenic forms of E. coli, such as type O157:H7, have more than 20 single nucleotide base changes from the sequence of the β-glucuronidase gene of the common laboratory strain of E. coli (strain K12) and also have a two-base frame shift insertion that yields a truncated inactive gene product (Monday et al., 2001). The presence of a substantial amount of genetic variation in the β-glucuronidase gene suggested that it therefore might be useful as a genetic tool for tracking sources of E. coli in the environment.
Genetic differences or similarities between populations of E. coli were analyzed in E. coli strains isolated from beaches on a lake and from a river and its tributaries that empty into the lake not far from the beaches. The river had been suggested as a possible source of contamination of the beaches, and therefore data were analyzed to determine whether the E. coli populations were similar at these nearby locations. In addition, β-glucuronidase sequences of waterborne E. coli and fecal reference samples from putative hosts were compared to determine likely host sources for the observed contamination.
The geographic focus of this study was the Clinton River watershed in southeastern Michigan and two beaches on Lake St. Clair, into which the Clinton River flows (Fig. 1)
. Lake St. Clair is a large lake (approximately 1100 km2) along the connecting water channel between Lake Huron and Lake Erie (Great Lakes Information Network, 2004).Fig. 1.
Sampling sites on Lake St. Clair and the Clinton River in Michigan. Sites 1 through 11 were sampled in the present study. Sites 12 through 14 are relevant sites discussed from other studies (Selegean et al., 2001; M. Samadpour, unpublished data [microbial source tracking study of the Blossom Heath Beach, report for Macomb County Department of Public Works, 2001]).
MATERIALS AND METHODS
Water samples were collected from Metro and Memorial Park beaches on Lake St. Clair (Fig. 1, Sites 1 and 2, respectively) and from the Clinton River watershed (Fig. 1, Sites 3–10). Memorial Park Beach has had an upward trend of bacteria levels since 1988 and beach closures as many as 84 d out of the summer swimming season (Macomb County Health Department, 2000). Metro Beach is a regional beach that is patronized by more swimmers than Memorial Park Beach. The Clinton River watershed traverses a suburban and rural area northeast of the city of Detroit and is an Area of Concern (USEPA, 2004). Site 10 is in Red Run, which is the terminus of Twelve Towns Drain, a combined sewer that is normally pumped to a sewage treatment plant in southwestern Detroit but sometimes overflows into Red Run. Bear Creek (Site 9) is a highly polluted tributary to Red Run; daily monitoring of Bear Creek for 68 d indicated a median concentration of E. coli of 9600 colony forming units per 100 mL (Selegean et al., 2001). The lower Clinton River (starting at Site 6) diverges into a small “delta” comprised of the main Clinton River mouth (Site 3), a boating channel (Site 4), and a flood control branch (Site 5).
At Lake St. Clair beaches, most water samples were collected approximately 3 m from the shoreline using a long pole. In addition, the Macomb County Health Department collected water for the project in waist-deep water during their biweekly recreational water monitoring collection. On each collection date, eight to ten independent water samples greater than 100 mL in volume were collected at each site in sterile Whirlpak bags or bottles and stored on ice until processing within four hours of collection. Water was collected on seven days during June to October, 2001.
Human fecal samples were obtained from volunteers according to procedures approved by the Wayne State University Human Investigation Committee. Fecal samples from animals were collected in the field at local parks, from pet stores and kennels, from pet or domesticated animal owners, and from the Wayne State University Laboratory Animal Resources animal facilities. Animal fecal samples included samples from duck, goose, sea gull, cat, dog, cow, and horse. Water samples were also collected on two occasions from the Twelve Towns Drain combined sewer (Site 11, Fig. 1).
Isolation of Escherichia coli
One-milliliter aliquots from each water sample were cultured either with Colilert-18 medium (IDEXX, Portland, ME) or on 3M (St. Paul, MN) Petrifilm E. coli/Coliform Count Plates for 20 to 24 h at 35.5°C. Presumptive E. coli colonies on Petrifilm were identified by their blue color and gas production (Vail et al., 2003). For isolation of individual colonies (clones) of E. coli, one drop of positive (fluorescent) Colilert cultures or a candidate colony from Petrifilm was streaked onto MacConkey plates (PML Microbiologicals, Mississauga, ON, Canada) and cultured for 24 h at 35.5°C. Well-isolated pink colonies on the MacConkey plates were then inoculated into 1 mL of Colilert-18 medium and cultured overnight at 35.5°C for testing as E. coli having β-glucuronidase activity. Positive cultures were frozen after addition of 0.4 mL of 50% sterile glycerol. At each stage of the procedure an appropriate number of cultures were analyzed and isolated so that at the end no more than two isolates were obtained from each original water sample and no more than one isolate was included in the final archive from each of the initial Colilert tubes or Petrifilm colonies.
Fecal E. coli isolates were obtained by streaking a small amount from the interior of a fecal sample onto MacConkey plates and incubating for 24 h at 35.5°C. Well-isolated pink colonies were inoculated into Colilert-18 for detection of β-glucuronidase activity. Positive cultures were frozen after addition of 0.4 mL of 50% sterile glycerol. In most cases, only one positive isolate was archived from each fecal sample.
Polymerase Chain Reaction to Obtain Samples for Sequencing
Bacterial cultures had a high enough titre so that sufficient DNA for efficient polymerase chain reaction (PCR) could be released by the freezing, thawing, and heating process inherent in the culture storage and PCR process itself. Polymerase chain reactions were set up with 24 μL PCR master mix (88 μL 10X PCR buffer, 16.5 μL 50 mM MgCl2, 11 μL 10 mM dNTPs, 5.5 μL of 5U μL−1 Taq polymerase, all reagents from Invitrogen [Carlsbad, CA]; plus 385 μL sterile water), 2-μL primers each at 10 pmol μL−1, and 1 μL of bacterial culture. After mixing and centrifuging, PCR reaction mixtures were heated rapidly to 94°C for 10 min; subjected to 30 PCR cycles of annealing at 60°C for 45 s, extension at 72°C for 45 s, and denaturation at 94°C for 1 min; then held at 72°C for 10 min for final complete extension, and then cooled to 4°C. All PCR runs included a positive control of an E. coli isolate previously demonstrated to produce the expected PCR product and a negative control (1 μL of sterile water in place of the bacterial culture). All PCR preparation and reactions were done in a room on a separate hallway from the rooms in which (i) bacteria were cultured and (ii) PCR products were analyzed by electrophoresis and prepared for sequencing.
The primers were designed to amplify a 587-nucleotide segment of the E. coli β-glucuronidase gene, extending from nucleotide 298 to 884 of GenBank sequence S69414 The forward primer (298F) was 5′-AATAATCAGGAAGTGATGGAGCA-3′, and the reverse primer (884R) was 5′-CGACCAAAGCCAGTAAAGTAGAA-3′. When tested for specificity with strains of E. coli, Enterobacter, Klebsiella, Proteus, Pseudomonas, and Salmonella these primers gave PCR products only with E. coli (unpublished data). Also, simulation of a “virtual PCR” with these primers of all sequences in GenBank yielded product only with sequences from E. coli A small percentage of isolates that produced no PCR products with these primers (see Results, below) may be E. coli with variation in the primer regions or non-E. coli that were nevertheless positive on the Colilert test.
The PCR products were verified by electrophoresis in 2% agarose in TAE (0.04 M Tris-acetate, 0.001 M EDTA, pH 8.0; Maniatis et al., 1982), followed by staining in ethidium bromide (0.05 mg per 100 mL). Intensity of fluorescent staining pattern, imaged with a Biophotonics (Ann Arbor, MI) Gelprint 2000i system, indicated amount of PCR product. Thus, 1 to 4 μL of PCR product, containing an estimated 200 to 400 ng of DNA, was diluted in 19 μL of sterile water and sequenced in both forward and reverse directions at the Wayne State University DNA Sequencing Facility by ABI Dye Terminator dideoxy cycle sequencing on an ABI 3700 DNA Analyzer (Applied Biosystems, Foster City, CA).
A consensus sequence was determined by aligning the forward sequence with the reverse complement of the reverse sequence and a reference sequence in GenBank (S69414), using the GCG SeqWeb suite of programs (Version 1.2, interface with the Wisconsin Package Version 10.1; Accelrys, 2000). Differences in sequence from the reference sequence were identified as only those loci at which both forward and reverse complement sequences agreed with one another and differed from the reference sequence. Where disagreements or ambiguities occurred between forward and reverse complement sequences, the chromatographic patterns were examined using Chromas software (Technelysium, 1998) to determine the correct sequence, and/or PCR reactions and sequencing were repeated.
Final sequence analysis examined only sequence data of the 525-nucleotide region corresponding to bases 331 to 855 of the S69414 reference sequence. Sequences were encoded, analyzed, and compared using software developed in-house in Java, Visual Basic, Perl, and Excel functions. The dendrogram illustrated in this paper was constructed using MEGA Version 2.1 (Kumar et al., 2001), using neighbor-joining algorithms and the Tamura–Nei substitution model. Dendrograms were also constructed using other MEGA2 substitution models and bootstrapping methods and with GeneBee programs (Moscow State University, 2004) and the SeqWeb GrowTree program. Sequences were submitted to GenBank and are a subset of 148 uidA sequences with GenBank accession numbers AY447047 to AY447194
The frequencies with which various alleles or groups of alleles were observed from different collection sites were compared using SigmaStat 2.0 (Jandel, 1995) to perform chi square or analysis of variance (ANOVA). To avoid limitations inherent in chi square analysis (frequencies should not be less than 5 in more than 20% of the cells of the contingency table), it was necessary to collapse logically related samples and alleles into larger groups. Thus, alleles were grouped according to their position on the MEGA2 dendrogram, and comparisons were made between allele frequencies at the beach and nonbeach (i.e., Clinton River watershed) sites as well as between upstream and downstream sites in the Clinton River. For host source analysis percentages of alleles assigned to various host categories (humans, birds, farm animals, pets) were calculated for each of the seven collection days, and the average percentages at the various collection sites were then compared by ANOVA, followed by pairwise comparisons using the Student–Newman–Keuls method. Except as noted in Results, the power of all illustrated statistical tests was >0.80 and differences were considered significant for p < 0.05.
A total of 1049 E. coli strains were isolated from the Clinton River watershed and Lake St. Clair beaches (Table 1) When subjected to PCR using primers 298F and 884R, these strains gave PCR products of the predicted size in 996 cases, or 95% of the total tested. The remaining isolates gave no PCR product with these primers. High quality sequence data were obtained in both forward and reverse directions from 941 of these PCR products.
shows the frequencies and sequences of 60 alleles that were observed two or more times in these environmental isolates. These sequences differed from the S69414 reference sequence in as few as one to as many as 17 bases. Altogether, these environmental E. coli isolates exhibited 114 different alleles, varying in frequency from 35.6% for allele uidA1 to as few as one isolate, as was observed for 54 of the alleles.
Frequencies and sequences of the 60 most frequently observed alleles in environmental isolates. Horizontal bars represent the percent of all sequenced isolates, out of 941, that had the indicated sequence. At the right, the sequence of each allele is indicated as differences from a reference sequence, S69414 in GenBank, encoded as xny, where x = the base in the reference sequence, n = the position in the reference sequence at which the base is changed, and y = the base found in the allele. The length of each allele's list of sequence differences is proportional to the number of bases differing from S69414, as represented by the scale on the bottom axis.
illustrates a dendrogram constructed with MEGA2 of the 60 alleles illustrated in Fig. 2 Alleles in the dendrogram in Fig. 3 are bracketed into several groups of genetically related sequences, Groups A through K. Bootstrap values for this tree (not shown) tended to be low, which is probably not surprising since many of these sequences differ by only one or two bases and their precise topological relationships at particular branch points could vary from one bootstrap resampling to the next even when particular alleles are only one or two branch points removed from their illustrated positions. However, we tested whether the grouping of sequences was reliable by constructing other dendrograms using various MEGA2 models, including Jukes–Cantor, Kimura-2 parameter, Tajima–Nei, and number of differences, as well as SeqWeb and other internet tree-building tools. All agree, more or less, with the groups illustrated here, albeit with some groups changing their rooting relative to other groups. For example, Group D sometimes branched near Group E (as illustrated here), sometimes was “monophyletic” with Group E, and other times branched near the base of Group F. However, we found the illustrated groups to be generally coherent and reproducible through all methods.Fig. 3.
Genetically related groups in a dendrogram of the 60 E. coli β-glucuronidase alleles illustrated in Fig. 2
Comparisons of Allele Frequencies at Different Geographical Sampling Sites
The frequencies with which isolates from different groups of genetically related alleles occurred at various geographical sampling sites were compared using chi square analysis. To analyze whether the populations of E. coli at the Lake St. Clair beaches differed from those in the Clinton River watershed, environmental sites were grouped into beaches (Memorial Beach, Fig. 1, Site 1; and Metro Beach, Site 2); lower Clinton River (Sites 3, 4, 5, and 6), and Red Run–Bear Creek (Sites 9 and 10).
illustrates the frequencies with which alleles in the various sequence groups, labeled A through K in the tree in Fig. 3, occurred at these geographic sites. A preliminary comparison of beach to nonbeach sequences resulted in too many cells having low frequencies, which was traced to low frequencies of alleles in both Groups D and E. Since Groups D and E are closely related groups on this tree, their alleles were collapsed into one group for further analyses, thereby avoiding the warnings of low cell frequencies. With Groups D and E combined, the frequencies of alleles at the beaches compared with all nonbeach sites combined was significant at p < 0.001.Fig. 4.
Frequencies of alleles of Groups A through K at Red Run–Bear Creek (Sites 9 and 10), lower Clinton River (Sites 3, 4, 5, and 6), and Metro and Memorial beaches (Sites 1 and 2). Groups D and E were combined because of low frequencies.
Frequencies of alleles belonging to these genetically related groups at the beaches, Red Run–Bear Creek, and the lower Clinton River are illustrated in Fig. 4 The beaches differed significantly from Red Run–Bear Creek (p < 0.001) and from the lower Clinton River (p = 0.030). Furthermore, Red Run–Bear Creek differed significantly from the lower Clinton River (p = 0.022). Lest one might conclude that these tests are so sensitive that all comparisons would be statistically significant, we also made comparisons with the allele frequencies at Sites 7 and 8, which, like Bear Creek–Red Run, are upstream from the lower Clinton River. This analysis indicated no significant difference from the lower Clinton River (p = 0.272; albeit power = 0.609, indicating caution). Allele frequencies at Sites 7 and 8 (combined) did, however, differ significantly from the beaches (p < 0.001).
Relationship to Possible Host Sources
Differences in E. coli populations could be due to differences in host sources. For comparison, sequences were obtained from 182 E. coli isolates from various fecal samples (Table 2) Of the 81 alleles identified in these isolates, some were observed exclusively in one of these host species categories (e.g., uidA5 and uidA11 only in birds and uidA9, uidA13, and uidA15 only in humans); however, many alleles occurred in several host species categories. For relating environmental isolates to possible host sources, alleles that had been observed in only one host category could readily be “assigned” to that category; however, for the others, an algorithm for making these assignments on a probabilistic basis was developed. We calculated the frequency with which a specific allele had occurred in a particular host category (data not shown) and then determined the ratio of these frequencies from one host category to the others. Thus, allele uidA7 was observed in 3.22% of all human sequences, in 1.39% of all farm animal sequences (a ratio of 7:3), and not at all in any of the bird or pet sequences. Taking a ratio of these frequencies, we assigned uidA7 isolates of unknown origin to humans with a probability of 0.7 and to farm animals with a probability of 0.3, and, of course, not at all to birds or pets. This algorithm would assign alleles previously seen only in one host category exclusively to that category but would leave unassigned any allele that had not been seen in any fecal isolate (e.g., uidA10). The probabilities calculated according to this method for the 20 most common alleles are shown in Table 3
Assignment probability is calculated as the ratio, out of 1.0, of the frequencies with which each allele occurred within each host category. Assignment probability is illustrated only for the 20 most frequently occurring alleles.
To test whether this was an effective method of assigning “unknown” isolates to the correct host category, the rate of correct classification was determined when these criteria were utilized to assign the known fecal isolates to host categories. The 182 sequences from birds, pets, farm animals, and humans were “assigned” to particular hosts on a probabilistic basis using a Visual Basic program developed in-house, in which the assignments were made on the basis of random numbers, and then the percent of isolates correctly classified for each host group was calculated. On 1000 replicates, this procedure assigned the 182 sequences from birds, pets, farm animals, and humans to their correct host categories approximately 60% of the time. One of the reasons that this categorization was not better than this is that the most common allele, uidA1, occurred in all four host categories, which meant that there was a high probability of assigning it to the “wrong” category. The analysis was repeated, leaving out all uidA1 isolates, and this resulted in a rate of correct classification of the remaining isolates of approximately 75% (humans, 81.8%; birds, 74.4%; pets, 65.8%; and farm animals, 83.3%).
This procedure was then applied, leaving out all uidA1 isolates, to isolates sequenced from the environmental samples (Fig. 5)
. The mean ± standard error of the percentage of isolates assigned to human sources on seven collection days was Red Run–Bear Creek, 30.2 ± 2.8%; the lower Clinton River, 25.6 ± 2.7%; and the beaches, 16.9 ± 2.0%; (two-way analysis of variance, p < 0.02; Student–Newman–Keuls pairwise comparisons, Red Run–Bear Creek vs. beaches and lower Clinton River vs. beaches, p < 0.05).Fig. 5.
Average frequencies (mean ± standard error) that isolates from Red Run–Bear Creek (Sites 9 and 10), lower Clinton River (Sites 3, 4, 5, and 6), and Metro and Memorial beaches (Sites 1 and 2) were assigned to host categories of human, birds, pets, or farm animals on seven sampling dates. Indicated percentages are out of the isolates that could be assigned by reference to known source isolates, leaving out isolates of uidA1 and alleles not observed in reference fecal isolates. Two-way analysis of variance (ANOVA), hosts, p < 0.001; sites within hosts, p < 0.02; *, significantly different from beaches within a particular host; and #, significantly different from humans within that collection site (Student–Newman–Keuls pairwise comparisons, p < 0.05).
The same analysis was applied to 34 E. coli isolates obtained on two collection dates from the Twelve Towns Drain combined sewer (Site 11 in Fig. 1), which would be expected to have a higher level of human fecal matter. Excluding, as above, uidA1 alleles from the analysis, the resultant assigned categories averaged human, 51%; birds, 30%; pets, 17%; and farm animals, 3%. This is a higher percentage of isolates assigned to humans and a lower percentage assigned to birds than for any environmental populations that we analyzed, resulting in significant differences compared with the host frequencies at the watershed and beach sites (two-way analysis of variance, p 90% rate of correct classification often reported for ribotype analysis (Carson et al., 2001; Parveen et al., 1999; Samadpour, 2002). Second, the host sources of strains from a combined sewer, which would be expected to have a high human fecal content, were categorized as 51% from a human source, the highest of any site analyzed in this study. Third, analysis of beach isolates in this study can be compared with a ribotype analysis of E. coli populations at Blossom Heath Beach (Site 14 in Fig. 1) (M. Samadpour, unpublished data [microbial source tracking study of the Blossom Heath Beach, report for Macomb County Department of Public Works, 2001]). Among 252 isolates that could be classified by ribotype analysis, investigators reported bird, 67%; pets, 19%; humans, 7%; and other, 7%, similar to our findings at Metro and Memorial Park beaches that birds and pets account for higher proportions of the isolates than human and other sources. A hydrological study of Metro Beach also suggested that a major source of contamination is wildlife, particularly birds, that frequent the lakeshore (Smith et al., 2000). Fourth, these procedures demonstrated a significantly higher percentage of isolates in Red Run–Bear Creek attributable to human sources than were observed at the Lake St. Clair beaches.
Problems in the host category analysis described in this paper are encountered with the rarest alleles and the most common allele. Alleles that were not matched with a previously sequenced fecal isolate were ignored in this analysis. Sequencing more fecal isolates would increase the likelihood of encountering such rare alleles and thereby being able to assign them to a host category. The most common allele was found in all major host categories. Rather than ignore these data, as in the present analysis, this group can be split apart genetically by sequencing additional regions of the genome (T.S. Whittam, personal communication, 2003). Possibly, other regions of the genome will provide greater host specificity than the uidA gene chosen for this study. The present paper demonstrates a sequence-based method of identifying host sources in principle; however, more fecal isolates and additional regions of the genome need to be sequenced to determine the best genome regions for identifying host sources.
An advantage of this sequence-based method of host-source tracking is that sequences are readily compared between laboratories, regardless of the sequencing method. Another advantage is that after sequences are known, informative sequence differences may also be detected with faster secondary methods (e.g., real time allele-specific PCR). With regard to costs, a recent estimate for large genomics projects was $0.03 per base pair for eightfold coverage (approximately $2 per read of a 600-base segment) of a bacterial genome (Read et al., 2002). As sequencing costs continue to fall, the sequence data itself will become a smaller portion of the total analysis costs.
This work was supported by a contract with the Michigan Department of Community Health with matching funds from Wayne State University and technical services provided by the U.S. Army Corps of Engineers (project director: J.L. Ram), and an NIH-IMSD grant, which provided support for F. Gonzales. We appreciate assistance in this project in sample collection and preliminary data analysis by Z. Javetz (a WSU Medical Alumni Society fellow), M. Ponniah, J. Lu, M. Kovur, F. Hamade, and S. Hammad. We gratefully acknowledge cooperation in collecting samples from the Macomb County Health Department and helpful advice and criticisms from Carl Freeman (Biological Sciences, Wayne State University), and Thomas S. Whittam (Microbiology, Michigan State University).