METHODS
Sample Collection
All studies of South Indian populations were performed with the approval of the Institutional Review Board of the University of Utah, Andhra University, and the government of India. Adult males living in the district of Visakhapatnam, Andhra Pradesh, were questioned about their caste affiliations and surnames and the birthplaces of their parents. Those who were unrelated to any other subject by at least three generations were considered eligible to participate.
We classified caste populations based upon the traditional ranking of these castes by
varna (defined below), occupation, and socioeconomic status. According to various Sanskrit texts, Hindu populations were partitioned originally into four categories or
varna: Brahmin, Kshatriya, Vysya, and Sudra (
Tambia 1973;
Elder 1996). Those in each
varna performed occupations assigned to their category. Brahmins were priests; Kshatriya were warriors; Vysya were traders; and Sudra were to serve the three other
varna (
Tambia 1973;
Elder 1996). Each
varna was assigned a status; Brahmin, Kshatriya, and Vysya were considered of higher status than the Sudra because the Brahmin, Kshatriya, and Vysya are considered the twice-born castes and are differentiated from all other castes in the caste hierarchy. This is the rationale behind classifying them as the upper group of castes (
Tambia 1973).
The Kapu and the Yadava are called once-born castes that have traditionally been classified in the Sudra, the lowest of the original four
varna. However, the status of the Sudra was actually higher than that of a fifth
varna, the Panchama. This fifth
varna was added at a later date to include the so-called untouchables, who were excluded from the other four
varna (
Elder 1996). The untouchable
varna includes the Mala and Madiga. The position of the Relli in the caste hierarchy is somewhat ambiguous, but they have usually been classified in the lower caste group. Therefore, prior to the collection of any data, males from eight different Telugu-speaking castes (
n = 265) were ranked into upper (Niyogi and Vydiki Brahmin, Kshatriya, Vysya [
n = 80]), middle (Telega and Turpu Kapu, Yadava [
n = 111]), and lower (Relli, Madiga, Mala [
n = 74]) groups (
Bamshad et al. 1998). This ranking has been used by previous investigators (
Krishnan and Reddy 1994).
After obtaining informed consent, ∼8 mL of whole blood or 5 plucked scalp hairs were collected from each participant. Extractions were performed at Andhra University using established methods (
Bell et al. 1981).
MtDNA Polymorphisms
The mtDNA data consisted of 68, 116, and 73 HVR1 sequences and 79, 159, and 72 restriction-site haplotypes from largely the same individuals in upper, middle, and lower castes, respectively. These data were compared to data from 143 Africans (15 Sotho-Tswana, 7 Tsonga, 14 Nguni, 24 San, 5 Biaka Pygmies, 33 Mbuti Pygmies, 9 Alur, 18 Hema, and 18 Nande), 78 Asians (12 Cambodians, 17 Chinese, 19 Japanese, 6 Malay, 9 Vietnamese, 2 Koreans, and 13 Asians of mixed ancestry), and 99 Europeans (20 unrelated males of the French CEPH kindreds, 69 unrelated Utah males of Northern European descent, and 10 Poles) (
Jorde et al. 1995,
1997). Mitochondrial sequence data from these 597 individuals are available at:
http://www.genome.org/supplemental/.
In addition to our samples, the phylogenetic analyses also included data from 98 published HVR1 sequences from two castes (48 Havlik and 43 Mukri), and a tribal population (7 Kadar) living in south-western India (
Mountain et al. 1995) and restriction-site haplotypes from one caste (62 Lobana) from Northern India, three tribal populations from Northern (12 Tharu and 18 Bhoksa) and Southern (86 Lambadi) India, and 122 individuals from various caste populations in Uttar Pradesh (
Kivisild et al. 1999). Phylogenetic relationships of HVR1 sequences assigned to haplogroup M were estimated for Indians (this study), Turks (this study), Central Asian populations (
Comas et al. 1998), Mongolians (
Kolman et al. 1996), Chinese (
Horai et al. 1996), and Japanese (
Horai et al. 1996;
Seo et al. 1998).
The mtDNA HVR1 sequence was determined by fluorescent Sanger sequencing using a Dye terminator cycle sequencing kit (Applied Biosystems) according to the manufacturer's specifications (
Bamshad et al. 1998). Sequencing reactions were resolved on an ABI 377 automated DNA sequencer, and sequence data were analyzed using ABI DNA analysis software and SEQUENCHER software (Genecodes). To identify mtDNA haplotypes and haplogroups (a group of haplotypes that share some sequence variants), major continent-specific genotypes (
Torroni et al. 1994,
1996;
Wallace 1995) for the following polymorphic mtDNA restriction sites were determined:
HpaI3592,
DdeI10394,
AluI10397,
AluI13262,
BamHI13366,
AluI5176,
HaeIII4830,
AluI7025,
HinfI12308,
AccI14465,
AvaII8249,
AluI10032,
BstOI13704, and
HaeII9052.
Y-Chromosome and Autosomal Polymorphisms
Y-chromosome-specific STRs (DYS19, DYS288, DYS388, DYS389A, DYS390) were amplified using published conditions (
Hammer et al. 1998). PCR products were separated on an ABI 377 automated sequencer and scored using ABI Genotyper software. Y-chromosome STR data were collected from 622 males including 280 South Indians, ∼200 Africans (
Seielstad et al. 1999; this study), 40 Asians, and 102 Europeans. Autosomal data were collected from 608 individuals including 265 South Indians, 155 Africans, 70 Asians, and 118 Europeans.
The Y-chromosome-specific biallelic polymorphisms tested included: DYS188792, DYS194469, DYS211105, DYS221136, DYS257108, DYS287, M3, M4, M9, M12, M15, SRY4064, SRY10831.1, SRY10831.2, p12f2, PN1, PN2, PN3, RPS4Y711, and Tat (
Hammer and Horai 1995;
Hammer et al. 1997,
1998,
2000;
Underhill et al. 1997;
Zerjal et al. 1997;
Karafet et al. 1999). All individuals tested negative for the Y
Alu insert (DYS287). A complete description of the Y-chromosome STR loci can be found in
Kayser et al. (1997). A table of the biallelic Y-chromosome haplotype frequencies in the upper, middle, and lower castes is available at
http://www.genome.org/supplemental/.
For the Y-chromosome biallelic dataset, comparisons were made to a different set of worldwide populations including: East Asians from Japan, Korea, China, and Vietnam (
n = 460); Western Europeans from Britain and Germany (
n = 77); Southern Europeans from Italy and Greece (
n = 148); and Eastern Europeans from Russia and Romania (
n = 102) (M.F. Hammer, unpubl.). The complete dataset of Indians consisted of 55 Brahmin, 111 Yadava and Kapu, and 74 Relli, Mala, and Madiga.
Autosomal polymorphisms were amplified using conditions specifically optimized for each system. Further information on these conditions is available at the Web site:
http://www.genetics.utah.edu/∼swatkins/pub/Alu_data.htm or
http://www.genome.org/supplemental. With minor exceptions caused by typing failures or other causes, the same individuals from each population were used to create each dataset (i.e., mtDNA, Y chromosome, and autosomal). The complete dataset of genotypes from all 40 autosomal loci is available at:
http://www.genome.org/supplemental/.
Statistical Analyses
Genetic distances for Y-chromosome STRs were estimated using the method of
Shriver et al. (1995), which assumes a stepwise mutation model. Genetic distances for mitochondrial and autosomal markers were calculated as pairwise
FST distances, using the ARLEQUIN package (
Schneider et al. 1997). For autosomal polymorphisms, Nei's standard distances and their standard errors were estimated using DISPAN (
http://www.bio.psu.edu/IMEG); and 90% confidence intervals were estimated by multiplying the standard error by 1.65. The significance of the
FST distances between populations was estimated by generating a null distribution of pairwise
FST distances by permuting haplotypes between populations. The
p-value of the test is the proportion of permutations leading to an
FST value larger than or equal to the observed one. Genotypic differentiation was estimated using GENEPOP (
Raymond and Rousset 1995) vers. 3.2 (
http://www.cefe.cnrs-mop.fr/). The null hypothesis tested is that there is a random distribution of
K different haplotypes among
r populations (the contingency table). All potential states of the contingency table are explored with a Markov chain, and the probability of observing a table less than or equally likely to the observed sample configuration is estimated.
Estimates of significance for the correlation between interindividual caste rank differences and interindividual autosomal genetic distances were made by forming two
n ×
n matrices, where
n is the number of individuals. For the first matrix, interindividual genetic distances were based on the proportion of
Alu insertions/deletions shared by each pair of individuals. To form the second matrix, each individual was assigned a score according to his rank in the caste hierarchy for caste groups (i.e., upper caste = 1, middle caste = 2, lower caste = 3) and also for separate castes (i.e., Brahmin = 1, Kshatriya = 2, Vysya = 3, Kapu = 4, Yadava = 5, Relli = 6, Mala = 7, and Madiga = 8). An interindividual matrix of score distances was formed by comparing the absolute value of the difference between the scores of each pair of individuals. The matrix of genetic distances was compared to 10,000 permuted matrices of score distances using a Mantel matrix comparison test (
Mantel 1967).
To illustrate phylogenetic relationships we constructed reduced median (
Bandelt et al. 1995) and neighbor-joining networks (
Felsenstein 1989). Coalescence times were calculated as in
Forster et al. (1996), using the estimator ρ, which is the average transitional distance from the founder haplotype.