GeneNote Home Page
About GeneNote
Methods
Contact Us
GeneNote Team






What's New

Publications

Terms of Use

Weizmann Institute of Science

Version 2.3
Release: October, 2008
Affymetrix GeneChip
Version:
HG-U95A-E
HG-U133-A
Analyzed by MAS5.0

Synchronized with
GeneCards Version 2.39

Untitled Document
  • Microarrays

  • TSI-Tissue specificity indices

  • SAGE-Serial Analysis of Gene Expression

  • Electronic Northern

  • Binary expression patterns

  • Human GeneAtlas HG-U133A

  • GeneAnnot based custom probesets

  • Microarrays

    • RNA source

    • PolyA+ RNA samples from twelve normal human tissues were purchased from Clontech (Palo Alto,CA).
      This collection of major human tissues includes:
      Bone marrow (catalog number: 6573-1), brain (6516-1), heart (6533-1), kidney (6538-1), liver (6510-1), lung (6524-1), pancreas (6539-1), prostate (6546-1), skeletal muscle (6541-1), spinal cord (6593-1), spleen (6542-1) and thymus (6536-1).
    • Data Normalization

    • Arrays were analyzed and expression values, called signal, was calculated for each gene by using Microarray Suit (MAS) version 5.0 software (Affymetrix, Santa Clara, CA) using default parameter settings. Scaling was not done via a MAS 5.0 option. Instead, the intensities of each array were log10 transformed and scaled to a constant reference value (global normalization). This reference value was the mean of all log intensities in all of the tissues.
    • Expression Profiles

    • Duplicate measurements were obtained for twelve normal human tissues hybridized against Affymetrix GeneChips HG-U95A-E. The intensity values (shown on the y-axis) were normalized and drawn on a novel scale, which is an intermediate between log and linear scales. This enables displaying several orders of magnitude on the same graph, while emphasizing the differences between them.
    • Aggregate Expression

    • The bar graphs represent the averaged expression level calculated for a given gene. The calculation is done by averaging all of the probe-sets individual profiles. The detailed expression profile with the annotation for each individual probesets are presented in the table on the given gene web-page.
    • Variation plots

    • Multiple probe-sets corresponding to the given gene are included for its tissue vector calculation only if their normalized intensity levels reach a threshold in at least one tissue. The variation of included and excluded probe-sets are visualized in the x-y plane: the x-axis shows Pearson's correlations between individual probe-sets vectors and the average tissue vector; the y-axis shows the relative length of an individual probe-set vector (its scalar length divided by that of the average vector). The average is shown as a black square, while individual probe-sets are depicted as colored circles.
    • Probe set annotation

    • The list of probes and their sequences was obtained from the Affymetrix public database (http://www.affymetrix.com/index.affx), wherein each probe-set on Affymetrix HG-U95 is constructed from 16 probes 25 nucleotide long. 16 probes taken from each probe-set were aligned against the mRNA sequences from the most comprehensive public databases using the GeneAnnot algorithm (http://genecards.weizmann.ac.il/geneannot/). The alignment was performed using the BLAT algorithm (Kent, W.J. (2002) BLAT--the BLAST-like alignment tool. Genome Res, 12, 656-64.) allowing one mismatch along the sequence. The quality scores of a probe-set are given per specific genes, while taking into consideration the relationships of other genes that were aligned. Specificity and sensitivity parameters describe the quality of each probe-set.
    • Specificity calculation

    • If a probe is aligned to a single gene its score will be 1. However, if it is aligned to n genes the value will be 1/n. The specificity is calculated by the weighted summation of all probe values in the set, divided by the total number of aligned probes in the set.
    • Sensitivity calculation

    • Probe set sensitivity is calculated by the summation of the successfully aligned probes divided by the total number of probes in the set (e.g., 16).
      Tissue specificity indices (TSI):
    • Signal Quantilization

    • The MAS5.0 intensities, ranging on a decimal logarithmic scale from log 10 30 to roughly 4, were converted into a quantile scale. The expression data, averaged over the two replicates were divided into 11 bins, whereby 10 equal density bins (quantiles) spanned the values above log 10 30, and an 11 th zero bin included the remaining low intensity values. Henceforth, the quantiled profiles were used in the analysis.
    • Entropy-based tissue specificity index

    • These indices were defined as follows:
      Entropy-based - TSIent - The quantiled profile was first normalized by dividing each intensity by the total intensity of that profile. The TSI is then based upon Shannon's entropy (Shannon, C.E. (1963) The mathematical theory of communication, University of Illinois Press, Champaign,IL.):
      where N is the number of tissues (12),and p is the normalized expression.
    • Statistical analysis of differential expression

    • Single-classification ANOVA with equal sample sizes (Sokal and Rohlf 2000) was employed on the preprocessed 24 element expression vector composed of 12 tissues in duplicates. ANOVA could be applied due to the near normal shape of the expression intensity distribution (Shmueli et al. 2003). Henceforth, we refer to the tissue expression vector of a probeset as its 'profile'. For each profile, the sum of the squares of the differences between the replicates was compared with the sum of the squares of the differences between the averages of the tissue expressions. A P-value was calculated using the F statistic taking into account the degrees of freedom. To account for the multiple comparison problem inherent in calculating the P-values for all 62,839 probesets, we calculated the false discovery rate of the P-values (Benjamini and Hochberg 1995). We chose a P-value cutoff of 0.0036 which estimates a 1% error rate. This resulted in 22,936 profiles that are defined as "differentially expressed". The rest of the profiles defined as housekeeping when no differences were shown within replicates or between samples, not-expressed when the expression in all samples were below the threshold and uncharacterized when the p-value was above the cutoff.
    • Highest Value Referenced - TSIhvr

    • The profile is first normalized by dividing each intensity by the highest intensity of that profile. The TSI is then:
      where N is the number of tissues (12) and x is the normalized expression vector.
    • Geometrical based - TSIgeo

    • The profile was first normalized by dividing each intensity by the highest quantile (10), thus effectively representing the profile as a point in a one unit 12-dimensional hypercube. We then compared each such point with the diagonal vector representing the housekeeping profile:
      where r is the distance to the diagonal and y is the distance to the closest axis.
    • Gap based - TSIgap

    • We defined the 'gap' for each expression profile as the maximum difference between the neighboring values in the sorted quantile vector. When the same 'gap' was found more than once in a profile, the minimum was taken. The 'gap' was then scaled relative to the maximum possible gap (10) such that the index ranges from 0 to 1.

    SAGE-Serial Analysis of Gene Expression

    • SAGE method

    • For ten normal human tissues (currently the relevant SAGE libraries are not available for spleen and thymus, shown in lower case and flagged with *) CGAP datasets Hs.frequencies and Hs.libraries are mined for information about the number of SAGE tags per tissue. Tags are reassigned to a Unigene cluster and after that to a particular gene by mining Hs.best_gene, Hs.best_tag and Hs_GeneData.
      The expression level of a particular gene in a particular tissue was calculated as the number of appearances of the corresponding tag divided by the total number of tags in libraries derived from that tissue. These fractions were then normalized by multiplying by 1.2M and the obtained normalized counts are presented on the same root scale as that is used for the electronic Northern pictures and experimental tissue vectors.
      Please note : Currently, only associations with minimal ambiguity participate in the analysis.
    • Best matching tag

    • Tag which is the best mach for that gene and vice versa.
    • Unique tag

    • Tag that uniquely represents the gene and doesn't correspond with any other gene.

    Electronic Northern

    • Electronic Northern method

    • For the shown set of normal human tissues NCBI's Unigene dataset Hs.data is mined for information about the number of unique clones per gene per tissue. Clones are assigned to particular tissues by applying data-mining heuristics to Unigene's library information file Hs.lib.info. Electronic expression results were calculated by dividing the number of clones per gene by the number of clones per tissue. They were then normalized by multiplying by 1M, and the obtained normalized counts are presented on the same root scale as the experimental tissue vectors. This scale (shown on the y-axis) is an intermediate between log and linear scales. This enables displaying several orders of magnitude on the same graph, while emphasizing the differences between them.

    Binary expression patterns

    • Binary expression patterns method

    • Arbitrary expression profiles are also presented in binary pattern form when possible, with at most 5 unique binary patterns shown for each gene. For each expression profile, all entries above a defined relative cutoff (termed 'gap') receive the value of 1, represented by black, and are considered as "over expressed", those below, which receive the value of 0, represented by white, are "under expressed". Various binary patterns in different tissues are shown per gene, with their counts on the left. (The grey stripes show undefined binary patterns).
      Please note: "Under expression" does not always mean the lack of expression.

    Human GeneAtlas HG-U133A

    • Human GeneAtlas HG-U133A

    • Reference: Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB (2004) A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A. 2004 Apr 20;101(16):6062-7

    GeneAnnot based custom probesets

    • Expression patterns based on custom CDFs method

    • Custom Chip Definition Files (CDFs) based on GeneAnnot were used for gene expression data preprocessing using MAS5.0 absolute analysis algorthm. This novel set of CDFs allows to perform a gene-centered analysis of gene expression data obtained from human Affymetrix GeneChips, removing the noise from probes with ambiguous matches on gene sequences.
      MAS55.0 intenities were then normalized as above described. See also the article by Itai Yanai et al. for details on data normalization procedure.