ARIES is a BBSRC-funded resource to generate epigenomic information on a range of human tissues at multiple time points across the lifecourse. The epigenetic data will be integrated with genetic and transcriptomic data and made available to the scientific community, with the facility to link data to both exposure and phenotypic data.
ARIES will use both Illumina Infinium 450k methylation arrays and BS-seq approaches to generate epigenetic data on a number of samples from the ALSPAC cohort. In addition genome-wide DNA methylation analysis of four human tissues in conjunction with matched peripheral blood samples will also be performed using Illumina Infinium 450k methylation arrays to enable comparison of methylation levels between tissues.
The primary objectives of ARIES in the period 2011-2013 are:
"This package provides classes for holding and manipulating Illumina methylation data. Based on eSet, it can contain MIAME information, sample information, feature information, and multiple matrices of data. An "intelligent" import function, methylumiR can read the Illumina text files and create a MethyLumiSet. The methylumIDAT function can directly read raw IDAT files from HumanMethylation27 and HumanMethylation450 microarrays. Normalization, background correction, and quality control features for GoldenGate, Infinium, and Infinium HD arrays are also included."(from the Methylumi reference manual)
We use Methylumi for:
 "ALLELE SPECIFIC EXTENSION"
 "EXTENSION GAP"
 "FIRST HYBRIDIZATION"
 "PCR CONTAMINATION"
 "SECOND HYBRIDIZATION"
toKeep <- (avgPval<0.05)
pData(mldat)$Gender <- "F"
mldat.norm <- normalizeMethyLumiSet(mldat[,toKeep])
"The minfi package provides tools for analyzing Illumina's Methylation arrays, with a special focus on the new 450k array for humans. At the moment Illumina's 27k methylation arrays are not supported. The tasks addressed in this package include preprocessing, QC assessments, identification of interesting methylation loci and plotting functionality. In general, the analysis of 450k data is not straightforward and we anticipate many advances in this area in the near future. The input data to this package are IDAT files, representing two different color channels prior to normalization. It is possible to use Genome Studio files together with the data structures contained in this package, but in general Genome Studio files are already normalized and we do not recommend this."(from the minfi user guide)
We use minfi for:
baseDir <- system.file("extdata", package = "minfiData")
 "5723646052" "5723646053" "SampleSheet.csv"
 "5723646052_R02C02_Grn.idat" "5723646052_R02C02_Red.idat"
 "5723646052_R04C01_Grn.idat" "5723646052_R04C01_Red.idat"
 "5723646052_R05C02_Grn.idat" "5723646052_R05C02_Red.idat"
targets <- read.450k.sheet(baseDir)
RGset <- read.450k.exp(base = baseDir, targets = targets)
qcReport(RGset, sampNames = pd$Sample_Name, sampGroups = pd$Sample_Group, pdf = "qcReport.pdf")
MSet.raw <- preprocessRaw(RGset)
MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = "controls", reference = 2)
Mset.swan <- preprocessSWAN(RGsetEx, MsetEx)
"The wateRmelon package is designed to make it convenient to use the data quality metrics and normalization methods from our paper as part of existing pipelines or workflows, and so as much as possible we have implemented S4 methods for MethyLumiSet objects (methylumi package), MethylSet and RGChannelSet objects (minfi package) and exprmethy450 objects (IMA package). In addition to our own functions, the package also contains functions by Matthieu Defrance and Nizar Touleimat and Andrew Teschendorff as well as a wrapper for the SWAN method."(from the wateRmelon homepage)
We use wateRmelon for:
melon.pf <- pfilter(melon)
melon.dasen.pf <- dasen(melon.pf)
sex <- pData(melon.dasen.pf)$sex
"CpGassoc is designed to rapidly test for association between methylation across thousands of CpG sites and a phenotype of interest, adjusting for relevant covariates and fixed or random ects for chip or batch. This package also includes tools to perform permutation tests, and to make QQ plots, manhattan plots, and scatterplots for individual CpG sites. CpGassoc can create .pdf files and .eps files for all of these plots. Selecting file.type="eps" will result in publication quality editable postscript files that can be opened in Adobe Illustrator or Photoshop. The two main functions of CpGassoc are cpg.assoc and cpg.perm. cpg.assoc will perform an association test that models methylation at each CpG sites as a function of the phenotype of interest and other covariates. cpg.perm performs permutation tests to obtain empirical and multiple-testing-adjusted p-values." (from the CpGassoc tutorial)
We use CpGassoc for:
"IMA (Illumina Methylation Analyzer) is a computational package designed to automate the pipeline for analyzing site-level and region-level methylation changes in epigenetic studies utilizing the 450K DNA methylation microarray. The pipeline loads the data from Illumina platform and provides user-customized functions commonly required to perform exploratory methylation analysis and summarization for individual sites as well as annotated regions.
Note that instead of providing recommendations about which specific analysis method should be used, the main purpose of developing IMA package is to provide a range of commonly used Infinium methylation microarray analysis options for users to choose for their exploratory analysis and summarization in an automatic way. Therefore, it is the best interest for the users to consult experienced bioinformatician/statistician about which specific analysis option/route should be chosen for their 450k microarray data." (from the IMA homepage)
We use IMA for:
"Non-biological experimental variation or batch effects are commonly observed across multiple batches of microarray experiments, often rendering the task of combining data from these batches difficult. The ability to combine microarray data sets is advantageous to researchers to increase statistical power to detect biological phenomena from studies where logistical considerations restrict sample size or in studies that require the sequential hybridization of arrays. In general, it is inappropriate to combine data sets without adjusting for batch effects. Methods have been proposed to filter batch effects from data, but these are often complicated and require large batch sizes (>25) to implement. Because the majority of microarray studies are conducted using much smaller sample sizes, existing methods are not sufficient. We propose parametric and nonparametric empirical Bayes frameworks for adjusting data for batch effects that is robust to outliers in small sample sizes and performs comparable to existing methods for large samples. We illustrate our methods using two example data sets and show that our methods are justifiable, easy to apply, and useful in practice." (from the Combat homepage)
We use COMBAT for:
ComBat("expression filename","sample information filename")
"In methylome-wide association studies (MWAS) there are many possible differences between cases and controls (e.g. related to life style, diet, and medication use) that may affect the methylome and produce false positive findings. An effective approach to control for these confounders is to first capture the major sources of variation in the methylation data and then regress out these components in the association analyses. This approach is, however, computationally very challenging because the human genome comprises over 30 million possible methylation sites. We introduce methylPCA that is specifically designed to handle this problem. Specifically, MethylPCA can:
"There has been a long-standing need in biomedical research for a method that quantifies the normally mixed composition of leukocytes beyond what is possible by simple histological or flow cytometric assessments. The latter is restricted by the labile nature of protein epitopes, requirements for cell processing, and timely cell analysis. In a diverse array of diseases and following numerous immune-toxic exposures, leukocyte composition will critically inform the underlying immuno-biology to most chronic medical conditions. Emerging research demonstrates that DNA methylation is responsible for cellular differentiation, and when measured in whole peripheral blood, serves to distinguish cancer cases from controls.
Here we present a method, similar to regression calibration, for inferring changes in the distribution of white blood cells between different subpopulations (e.g. cases and controls) using DNA methylation signatures, in combination with a previously obtained external validation set consisting of signatures from purified leukocyte samples. We validate the fundamental idea in a cell mixture reconstruction experiment, then demonstrate our method on DNA methylation data sets from several studies, including data from a Head and Neck Squamous Cell Carcinoma (HNSCC) study and an ovarian cancer study. Our method produces results consistent with prior biological findings, thereby validating the approach.
Our method, in combination with an appropriate external validation set, promises new opportunities for large-scale immunological studies of both disease states and noxious exposures." (from the paper 'DNA methylation arrays as surrogate measures of cell mixture distribution')