scispace - formally typeset
Open AccessJournal ArticleDOI

A Novel Linear Code ® Nomenclature for Complex Carbohydrates

Reads0
Chats0
TLDR
The Linear Code is a new syntax for representing glycoconjugates and their associated molecules in a simple linear fashion that enables the implementation of bioinformatics tools for investigation and analysis of glyco-molecules and their biology.
Abstract
The Linear Code is a new syntax for representing glycoconjugates and their associated molecules in a simple linear fashion. Similar to the straightforward single letter nomenclature of DNA and proteins, Linear Code presents glycoconjugates in a canonic, compact and practical form while accounting for all relevant stereochemical and structural configurations. It uses a single letter code to represent each monosaccharide and includes a condensed description of the connections between monosaccharides and their modifications, allowing a simple linear representation of these compounds. The new linear syntax enables the implementation of bioinformatics tools for investigation and analysis of glyco-molecules and their biology.

read more

Content maybe subject to copyright    Report

Trends in Glycoscience and Glycotechnology
Vol.14 No.77 (May 2002) pp.127–137
127
©2002 FCCA (Forum: Carbohydrates Coming of Age)
GLYCOFORUM
A Novel Linear Code
®
Nomenclature for Complex Carbohydrates
複合糖質のための新しいリニアコード
®
命名法
要 約
リニアコードは複合糖質および関連分子を単純な直線的様
式で表わす新しい表示法である。DNAやタンパク質の一文字
表示がわかりやすいのと同じ様に、リニアコードはありうる立
体化学および立体構造までを含めて、複合糖質を簡潔で実用的
な形式で表示する。各単糖に一文字コードをあて、単糖間の結
合様式およびいろいろな修飾をも含めて、糖鎖分子を簡単に線
状表示できるようになっている。この新しい線状表示法によっ
て、バイオインフォーマティックスのツールを糖分子の分析と
生物学的研究へ適用することが可能になる。
A.はじめに
サイエンスの 2001 3月の特別号では、糖を含む分子とそ
の生物活性の研究、すなわち糖鎖生物学を「シンデレラ領域」
呼んでいる。ゲノムプロジェクトがほぼ終わり、得られたデー
タから、遺伝子の総数がわずか 30000程度にすぎず、生物種が
違ってもゲノムにはそれほど差がないことが確められた。しか
しそのメッセージが転写後および翻訳後に修飾されるために、
実際に作られるタンパク質の数ははるかに多い。翻訳後修飾の
中でいちばん普遍的でまた多様なものは糖鎖による修飾である。
 天然のタンパク質の半分以上は糖タンパク質との見積もりも
ある(Apweiler
etal
.1999さまざまな生命現象において糖質が
不可欠な役割をになっていることを最近の研究成果が明らかに
している。タンパク質の立体構造形成(Parode,2000タンパク
質の局在化、免疫Huby
etal
.2000、細胞増Zannetaetal,
2000、ホルモンや成長因子に対する応答(VandenSteenetal,
1998などが例としてあげられる。多くのウイルスや細菌が、
胞に侵入して感染を成立させるために細胞表面の糖鎖を利用し
ている(Rossmannetal,2000;Hooper
etal
.2001。広範な病気の
解明に確実につながる路を発見するためにも、糖鎖機能の多様
性は研究対象としてきわめて刺激的である。
Banin, Ehud; Neuberger, Yael; Altshuler, Yaniv; Halevi, Asaf; Inbar, Ori; Dotan Nir; and Avinoam Dukler
*
Glycominds Ltd, 1 Yodfat St., Alon Bldg., Global Park, Lod, 71291, Israel
FAX: 972-8-9181081, E-mail: dukler@glycominds.com
Key Words : Linear Code, glycomics, nomenclature, carbohydrates, blycobiology.
Abstract
The Linear Code is a new syntax for representing
glycoconjugates and their associated molecules in a simple lin-
ear fashion. Similar to the straightforward single letter nomen-
clature of DNA and proteins, Linear Code presents
glycoconjugates in a canonic, compact and practical form while
accounting for all relevant stereochemical and structural con-
figurations. It uses a single letter code to represent each monosac-
charide and includes a condensed description of the connections
between monosaccharides and their modifications, allowing a
simple linear representation of these compounds. The new lin-
ear syntax enables the implementation of bioinformatics tools
for investigation and analysis of glyco-molecules and their biology.
A. Introduction
Glycobiology, the study of carbohydrate-containing mol-
ecules and their biological activity, was described in the March
2001 special edition of Science as a “Cinderella field”. As the
Genome project reaches its final stages, the obtained data is
confirming that there are only 30,000 genes and there are only
small differences between the genomes of different species. The
number of native proteins is however much larger, mainly due
to post-transcriptional and post-translation modifications of the
protein messages. The most common and most diverse post trans-
lation modification is protein glycosylation. It has been esti-
mated that more than half of the proteins in nature are glycopro-
teins (Apweiler et al., 1999). Recent studies have revealed es-
sential roles of carbohydrates in biological processes such as
protein folding (Parodi, 2000), protein localization, immunity
(Huby et al., 2000), cell proliferation (Zanneta et al., 1994), and
hormone and growth factor responses (Van den Steen et al.,
1998). In addition, many viruses and bacteria use cell-surface
carbohydrates to enter cells and subsequently initiate infections
(Rossmann et al., 2000; Hooper et al., 2001). The diversity in
carbohydrate function makes them exciting new targets for elu-
cidating crucial pathways in a wide range of diseases.
*Corresponding Author.

©2002 FCCA (Forum: Carbohydrates Coming of Age)
Trends in Glycoscience and Glycotechnology
Vol.14 No.77 (May 2002) pp.127–137
128
糖鎖生物学という領域の重要性はこれまでむしろ見過ごさ
れてきた。その根本的理由として、糖鎖の極端な複雑性と多様
性がある。その原因には、a単糖の型と修飾、b結合様式、c
分岐、がある。糖質研究をさらに難しくしているのは、構造が
多様すぎるために、簡単で合理的な命名法を確立できないこと
がある。糖質の命名法および表示法についていろいろな勧告や
提案がなされてきたが(たとえばIUPAC-IUBMB,Bohne-Lang,et
al)この領域に依然として残る悩みは、提案された規則に合わ
ない表示が絶えないこと、複合糖鎖を分かりやすく図示できな
いことの不便さなどである。
アミノ酸や核酸を簡単に線状表示できたことで、データ
ベース検索、ホモロジー検索のようなバイオインフォーマ
ティックスのためのツールは着々と整えられてきた。もはやあ
たりまえのようなこれらのツールは、ゲノミックスやプロテオ
ミックスの発展に大いに貢献してきた。データベースやバイオ
インフォーマティックスに役立つグライコミックスのツールを
発展させるためには、簡単で完全な線状表示がまず必要である。
そこでこの要求に応えて、複合糖質および関連分子を簡単に線
状表示できるように、LinearCodeTMと呼ばれる新しい表示法
が作られた。DNAおよびタンパク質についてのわかりやすい命
名法と同様に、LinearCodeは糖質を簡潔で実用的なやり方で、
あり得るすべての立体化学的および立体配置まで含めて表現で
きる。本論文で複合糖質を表示するための記号と規則を含めて、
新しい LinearCodeの内容を紹介する。
B.糖の表示法
糖を化学的に表現する方式はいくつもある。Fischer
Haworthの投影法はよく使われる。IUPAC-IUBMBは単糖を表示
する 3文字コードの使用を勧告し、オリゴ糖鎖や多糖鎖のため
の総括的な短縮形を提案した(図 1
B-1.単位糖の LinearCode表示
糖質を形成する最小の単位は基本的単位糖(SUである。
位糖は4つの要素からなる。単糖の名前、修飾(もしあれば)
ノマーの区別(グリコシド結合のαとβの立体配置)問題とす
SUへの結合位置。LinearCodeを使えば単位糖およびそれら
の間の結合を簡単に表現できる。
B-1-1.単糖
LinearCodeでは脊椎動物にもっとも普遍的な単糖に対して
文字のコードをあてる(表I普遍的な構造とは異なる単糖は以
下のように表示する。
・普遍的な単糖の立体異性体(Dまたは Lは引用符号「
 で表す(MS'
・普遍的な構造と環構造が違うときは(フラノースとピラ
Until recently, the field of glycobiology has been largely
overlooked. A primary reason has been the extreme complexity
and variability of carbohydrates derived from: (a) the types of
monosaccharides and modifications present; (b) the types of link-
ages; and (c) the presence of branching. In addition to creating
difficulties in the study of carbohydrates, this structural vari-
ability sets up an obstacle for development of a simple and con-
sistent nomenclature. While several recommendations and pro-
posals have been introduced for glycan nomenclature and rep-
resentation (i.e. IUPAC-IUBMB and Bohne-Lang, et al.), the
field still suffers from inconsistent use of the designated rules
and inconvenient illustration for complex carbohydrates.
The simple linear presentation of amino acids and nucleic
acids paved the way for bioinformatics tools, such as databases
and homology searches. These tools, which seem trivial today,
essentially served as the foundation for genomics and proteomics.
In order to develop glycomic tools for databases and
bioinformatics, a simple and comprehensive linear representa-
tion must first be employed. To meet this need, a new syntax
called the Linear Code™ has been developed for representing
glycoconjugates and their associated molecules in a simple lin-
ear fashion. Similar to the straightforward nomenclature of DNA
and proteins, Linear Code presents complex carbohydrates in a
compact and practical form while accounting for all relevant
stereochemical and structural configurations. This paper de-
scribes the novel Linear Code syntax, as well as the symbols
and rules used for representation of complex carbohydrates.
B. Carbohydrate representation
There are several established formats for chemical pre-
sentation of saccharides. The Fischer and Haworth projections
are frequently used. IUPAC-IUBMB has recommended the use
of three letter codes for the presentation of monosaccharides
and has suggested extended and condensed forms for the pre-
sentation of oligo- and polysaccharide chains (Fig 1).
B-1. Linear Code representation of saccharide units
The smallest unit comprising a carbohydrate is the basic
saccharide unit (SU). The saccharide unit is composed of four
elements: the monosaccharide name, modifications (if any), its
anomericity (the α and β configurations of the glycosidic bond)
and the position at which it is bound to a given SU. The Linear
Code offers a simple way of representing saccharide units and
the connections between them.
B-1-1. The monosaccharide
The Linear Code assigns a single letter code to the most
common structures of monosaccharides found in vertebrates
(Table I). In cases where the monosaccharides are different from
the common structure, they are expressed as follows:
• Stereoisomers (D or L) of the common monosaccahrides
are indicated with apostrophes: “ “ (MS’).
• Monosaccharides with different ring structure (furanose

Trends in Glycoscience and Glycotechnology
Vol.14 No.77 (May 2002) pp.127–137
129
©2002 FCCA (Forum: Carbohydrates Coming of Age)
Fig 1. Recommended symbols and conventions for drawing glycan structures compared to the new Linear Code represen-
tation. The example used is a typical branched “biantennary” N-glycan with two types of outer termini.
β4
β4 β4
β2
β2
α3
α6
α6
N-acetylglucosamine (GlcNAc)
Mannose (Man)
Galactose (Gal)
Fucose (Fuc)
Glucose (Glu)
Traditional Representation
β-D-Manp-(1-4)-β-D-GlcpNAc-(1-4)-α-D-GlcpNAc
β-D-Galp-(1-4)-β-D-GlcpNAc-(1-2)-α-D-Manp-(1-6)
β-D-GlcpNAc-(1-2)-α-D-Manp-(1-3)
α-L-Fucp-(1-6)
O
O
O
OH
OH
OH
O
O
CH
3
NH
O
O
O
OH
O
O
O
OH
NH
O
CH
3
OH
NH
O
OH
CH
3
OH
OH
O
O
OH
OH
OH
O
OH
OH
OH
O
O
OH
OH
CH
3
OH
O
OH
OH
O
OH
OH
OH
NH
O
CH
3
Full Representation
Linear Code
GNb2Ma3(Ab4GNb2Ma6)Mb4GNb4(Fa6)GNa
 ノース)挿入記号「^」で表す(MS^)
・立体異性および環構造の両方が普遍的構造と異なるとき
 は波形「〜」で表す。(MS˜
例:
 D-Galp=A(ガラクトースの普遍的な形)
 L-Galp=A'
 D-Galf=A^
 L-Galf=A˜
/pyranose) in relation to the common structure are indi-
cated with a caret: ^ (MS^).
Monosaccharides that differ in both stereospecificity
and ring structure are indicated with a tilde: “ ~
(MS~).
Example:
D-Galp = A (The most common structure of Galactose)
L-Galp = A
D-Galf = A^
L-Galf = A~

©2002 FCCA (Forum: Carbohydrates Coming of Age)
Trends in Glycoscience and Glycotechnology
Vol.14 No.77 (May 2002) pp.127137
130
B-1-2.糖鎖の修飾
基本的SUに付加したすべての非糖質部分を修飾とみなす。
修飾は角括弧で表し、そこには修飾の記号(表 IIと結合位置を
[#記号]の形で書く。たとえば D-Glcpの 3位に硫酸基(S)
あれば G[3Sひとつの単糖に 2つ以上の修飾がある場合は、
同じ角括弧の中に、その位置の順番に従って書く。ただしよく
知られた特定の修飾単糖は例外とする。たとえばN-アセチルガ
ラクトサミン(D-GalpNAc A[2N]と書けるが、2文字コー
ドのANを使う。同様にN-アセチルノイラミン酸はNNと書き、
N-アセチルグルコサミンは GNと書く(表 I
B-1-3.隣接単糖(MSへの結合
原則として LinearCodeは、結合に関する性質、つまりアノ
マー、繰り返し、環構造などを小文字で表す。隣接する単糖と
B-1-2. Modifications of the sugar chain
Modifications are defined as any addition of non-carbo-
hydrate moieties to the basic SU. The modifications are repre-
sented by adding square brackets that include the connecting
position of the modification to the SU, followed by the modifi-
cation symbol (Table II) in the form: [#symbol]. For example:
D-Glcp with Sulfate (S) in position 3 would be written G[3S]. If
there is more than one modification on the same monosaccha-
ride, they are written in numerical order according to their posi-
tion, within the same brackets. Exceptions include certain
monosaccharides with common modifications, for example: N-
acetlygalactosamine (D-GalpNAc) can be presented by A[2N],
but is instead represented with a short two letter code as AN. In
the same manner N-Acetylneuraminic acid is presented as NN,
and N-Acetylglucosamine is presented as GN (Table I).
B-1-3. Connection to a neighboring MS
In general, Linear Code uses lower case symbols to rep-
resent connecting motifs to the SU such as anomericty, repeat-
Trivial Name Monosaccharide / Core
1
Linear Code
D-Glcp D-Glucose G
D-Galp D-Galactose A
D-GlcpNAc N-Acetylglucosamine GN
D-GalpNAc N-Acetylgalactosamine AN
D-Manp D-Mannose M
D-Neup5Ac N-Acetylneuraminic acid NN
D-Neup Neuraminic acid N
KDN
2
2-Keto-3-deoxynanonic acid K
Kdo 3-deoxy-D-manno-2 Octulopyranosylono W
D-GalpA D-Galacturonic acid L
D-Idop D-Ioduronic acid I
L-Rhap L-Rhamnose H
L-Fucp L-Fucose F
D-Xylp D-Xylose X
D-Ribp D-Ribose B
L-Araf L-Arabinofuranose R
D-GlcpA D-Glucuronic acid U
D-Allp D-Allose O
D-Apip D-Apiose P
D-Fruf D-Fructofuranose E
1- all the monosaccharides are in their pyranose form unless otherwise noted.
2- KDN: 3-deoxy-D-glycero-K-galacto-nonulosonic acid.
Table I : Linear Codes of common monosaccharide structures (ordered by branch hierarchy).

Trends in Glycoscience and Glycotechnology
Vol.14 No.77 (May 2002) pp.127137
131
©2002 FCCA (Forum: Carbohydrates Coming of Age)
Table II : Linear Code of common modifications.
Modification Type Linear Code
deacetylated N-acteyl Q
ethanolaminephosphate PE
inositol IN
methyl ME
N-acetyl N
O-acetyl T
phosphate P
phosphocholine PC
pyruvate PYR
sulfate S
sulfide SH
2-aminoethylphosphonic acid EP
の間の結合を示すときには、2つの要素が必要である。つまり糖
のアノマー、隣接単糖に結合する位置である。アノマーの区別
は、αとβに対して、それぞれabの文字を使う。これらの記
号は修飾を表す記号のすぐ後に置く。結合位置はアノマーの後
に置く。
たとえば
 β -D-Galp(2P)-(1-3)-β -D-Glcp
は以下のようになる。
 A[2P]b3Gb
単糖の 1位に修飾があり、その上で他の単糖に結合してい
る場合には、アノマーの記号の後に角括弧[ ]内に数字なし
で修飾を書く(Ab[PG還元末端の糖が開環構造(olであれ
ば、小文字の「oをつける。糖質を左から右へと読む習慣にし
たがって、LinearCodeでも同様に左から右へ(非還元末端から還
元末端へ)と読む。
B-2.複合糖質
複合糖質はつながった単位糖の並びであり、直線的な単純
なものから、沢山の枝分かれや繰り返し、環状構造などさまざ
まである。LinearCodeはありうる組み合わせすべてに対応でき
るように、多様な規則を含んでいる。
ing and cyclic structures. Two components appear when illus-
trating the connection between adjacent monosaccharides: the
sugars anomer, and the position at which the sugar is connected
to the adjacent sugar. Anomericity is expressed using the let-
ters a” and ”b” to represent α and β anomers, respectively.
These appear immediately following the modification. The con-
nection position will appear after the anomer.
For example:
β-D-Galp(2P)-(1-3)-β-D-Glcp
Would be written as:
A[2P]b3Gb
In cases where a monosaccharide is connected from its
first position to a modification and then to another monosaccha-
ride, the modification will be written in square brackets [ ]
after the anomer, with no number in the brackets (Ab[P]G). If
the sugar at the reducing end is in its open form (ol) the letter
“o” (in lower case) is added. By convention, carbohydrates are
read from right to left. Consistent with this custom, the Linear
Code also reads from right to left (i.e. from the reducing end of
the carbohydrate).
B-2. Complex carbohydrates
Complex carbohydrates are comprised of sequences of
bound saccharide units. Complex carbohydrates range from
simple linear forms to highly branched, repeating and cyclic
structures. The Linear Code contains a diverse set of rules to
account for all possible combinations.

Citations
More filters
Journal ArticleDOI

GlycoWorkbench: a tool for the computer-assisted annotation of mass spectra of glycans

TL;DR: GlycoWorkbench is a software tool developed by the EUROCarbDB initiative to assist the manual interpretation of MS data to evaluate a set of structures proposed by the user by matching the corresponding theoretical list of fragment masses against the list of peaks derived from the spectrum.
Journal ArticleDOI

The Structure of Glycosaminoglycans and their Interactions with Proteins

TL;DR: Glycosaminoglycans (GAGs) are important complex carbohydrates that participate in many biological processes through the regulation of their various protein partners, such as growth factors, anti-thrombin, cytokines and cell adhesion molecules as mentioned in this paper.
Journal ArticleDOI

GlycoCT-a unifying sequence format for carbohydrates.

TL;DR: GlycoCT encompasses the capabilities of the heterogeneous landscape of digital encoding schemata in glycomics and is thus a step forward on the way to a unified and broadly accepted sequence format in glycobioinformatics.
Journal ArticleDOI

Structural glycobiology: A game of snakes and ladders

TL;DR: This review focuses on computational approaches that have been successfully used in combination with experiment to detail the three-dimensional structure of carbohydrates in a solution and in a complex with proteins.
Journal ArticleDOI

A new kind of carbohydrate array, its use for profiling antiglycan antibodies, and the discovery of a novel human cellulose-binding antibody.

TL;DR: A novel glycan array based on mono- and oligosaccharides covalently linked to the surface via a long linker at their reducing ends is used to analyze the glycan-binding antibody repertoire in a pool of affinity-purified IgG collected from a healthy human population and it is proposed that this array can facilitate high-throughput screening of glyCAN-binding proteins and the search for biomarkers for personalized medicine.
References
More filters
Journal Article

Carbohydrates and soluble lectins in the regulation of cell adhesion and proliferation.

TL;DR: No precise mechanism was demonstrated how carbohydrates can be involved in adhesion and proliferation, but new insights were opened with the discovery of vertebrate membrane-bound and soluble lectins.
Related Papers (5)