Subscribed unsubscribe Subscribe Subscribe

Kazuma Labolatory's

Nothing is Impossible. ✿

Menu

1種類だけではなかった( ˙◊˙◞ )◞!?💡 "ダブルクォーテーションマーク"🌟

少し前に知りました。このブログの「引用の部分」のクオーテーションマークも、このダブルクォーテーションマーク(以下DQM)を使っています。

 

CharacterUnicode Number source code
Character references (HTML)
 
    Named character references Decimal numeric Hexadecimal numeric
U+201C “ “ “

 

ちなみにキーボードによる直接入力は、以下の方です。

CharacterUnicode Number source code
Character references (HTML)
 
    Named character references Decimal numeric Hexadecimal numeric
" U+0022 " " "
0は省略可

 

」(U+201C) のDQMを使用したい場合は、ソースコード(html)に character references の3つの内のいずれかを記述する必要があります。

 

また、文字と同じように、指定フォントによってDQMも変わります。色々あって面白いですね✧(⸝⸝⸝ↂ⃙⃙⃚⃛︎ωↂ⃙⃙⃚⃛︎⸝⸝⸝)✨🌟

Left Double Quotation Mark

Sans Serif

Noto Sans JP
Josefin Sans
Raleway
Roboto Mono
Inconsolata

 

 

 

DQMには、上記2種類(U+0022,U+201c) 以外にもいくつか種類が存在します。その理由は、どうやらComputer の進化とそれに伴う文字群の標準化とが関係しているようですね🌟Unicode は予想以上に奥が深そうでした。Unicode®︎.orgのspec も試しに覗いてみたところ、1000ページ超でした(  Д ) ゚ ゚💦w 

 

Unicode®︎ とは

Unicode -IT用語辞典-

(丸投げでございます|*´-`)w)

 

Character Reference とは

Character References (HTML)

character references とは、簡単に言うと、各文字毎に割り当てられた、文字コードの事です。HTML における各文字のcharcter reference(文字コード)は、W3CのHTML5 仕様書で規定されています。

HTMLを記述する際、直接文字を入力するのではなく、代わりに character reference(文字コード) を入力することによって、文字をディスプレイ表示することができます。

なぜ直接入力だけではなく、そのような方法が存在するのか、その理由は仕様書にも色々書かれていましたが、ここでは割愛します(面倒だからではありません😎✨w)

character references には、以下の3種類があります。(*以下にソース引用有り)

❶Named character references❷Decimal numeric character references❸Hexadecimal numeric character references

(旧)『5.3 Character references(HTML 4.01-W3C-)

5.3 Character references

A given character encoding may not be able to express all characters of the document character set. For such encodings, or when hardware or software configurations do not allow users to input some document characters directly, authors may use SGML character references. Character references are a character encoding-independent mechanism for entering any character from the document character set.

Character references in HTML may appear in two forms:

Numeric character references (either decimal or hexadecimal).

Character entity references.

Character references within comments have no special meaning; they are comment data only.

 

(現)『8.1.4 Character references(HTML 5.1-W3C-)

8.1.4 Character references

In certain cases described in other sections, text may be mixed with character references. These can be used to escape characters that couldn't otherwise legally be included in text.

Character references must start with a U+0026 AMPERSAND character (&). Following this, there are three possible kinds of character references:

Named character references

The ampersand must be followed by one of the names given in the named character references section, using the same case. The name must be one that is terminated by a ";" (U+003B) character.

Decimal numeric character reference

The ampersand must be followed by a "#" (U+0023) character, followed by one or more ASCII digits, representing a base-ten integer that corresponds to a Unicode code point that is allowed according to the definition below. The digits must then be followed by a ";" (U+003B) character.

Hexadecimal numeric character reference

The ampersand must be followed by a "#" (U+0023) character, which must be followed by either a "x" (U+0078) character or a "X" (U+0058) character, which must then be followed by one or more ASCII hex digits, representing a base-sixteen integer that corresponds to a Unicode code point that is allowed according to the definition below. The digits must then be followed by a ";" (U+003B) character.

The numeric character reference forms described above are allowed to reference any Unicode code point other than U+0000, U+000D, permanently undefined Unicode characters (noncharacters), surrogates (U+D800–U+DFFF), and control characters other than space characters.

An ambiguous ampersand is a U+0026 AMPERSAND character (&) that is followed by one or more alphanumeric ASCII characters, followed by a ";" (U+003B) character, where these characters do not match any of the names given in the named character references section.

 

上記+僕の経験をまとめると…

  • HTML で character(文字など) を表示する方法は…

    (i)ソースコードに "文字を直接" 入力

    (ii)ソースコードに"character reference" を入力

  • 「html entity」「entity name」というのは昔(4.01以前?) の呼び名 + 略称。

    (旧:HTML4.01)引用にもある「character entity reference」の事だと思われる

    (新:HTML5.1)狭義には「named character reference」の事を指しており、広義には、それを含む全体「character reference( 全3種: Named | Decimal numeric | Hexadecimal numeric )」の事を指している模様。

元々、呼び名が体系的でなかった事や、使う人が省略して使っていたりすることもあって、ややこしく分かりにくかった…(๑ˊ•ω•;)💦 やっと理解しました。

Using Character Reference in CSS

CSSの場合は、character reference の3種類のうち、❸hexadecimal numeric character reference16進数のみ使用が可能。
【例】:

.foo:before {
	content:'\201c';

「\(back slash)(バックスラッシュ)」 + 「hexadecimal numeric charcter reference(unicode numberそのまま)」。

4.1.3 Characters and case(CSS 2.1-W3C-)

4.1.3 Characters and case

The following rules always hold:

All CSS syntax is case-insensitive within the ASCII range (i.e., [a-z] and [A-Z] are equivalent), except for parts that are not under the control of CSS. For example, the case-sensitivity of values of the HTML attributes "id" and "class", of font names, and of URIs lies outside the scope of this specification. Note in particular that element names are case-insensitive in HTML, but case-sensitive in XML.

In CSS, identifiers (including element names, classes, and IDs in selectors) can contain only the characters [a-zA-Z0-9] and ISO 10646 characters U+00A0 and higher, plus the hyphen (-) and the underscore (_); they cannot start with a digit, two hyphens, or a hyphen followed by a digit. Identifiers can also contain escaped characters and any ISO 10646 character as a numeric code (see next item). For instance, the identifier "B&W?" may be written as "B\&W\?" or "B\26 W\3F". Note that Unicode is code-by-code equivalent to ISO 10646 (see [UNICODE] and [ISO10646]).

In CSS 2.1, a backslash (\) character can indicate one of three types of character escape. Inside a CSS comment, a backslash stands for itself, and if a backslash is immediately followed by the end of the style sheet, it also stands for itself (i.e., a DELIM token).
First, inside a string, a backslash followed by a newline is ignored (i.e., the string is deemed not to contain either the backslash or the newline). Outside a string, a backslash followed by a newline stands for itself (i.e., a DELIM followed by a newline).

Second, it cancels the meaning of special CSS characters. Any character (except a hexadecimal digit, linefeed, carriage return, or form feed) can be escaped with a backslash to remove its special meaning. For example, "\"" is a string consisting of one double quote. Style sheet preprocessors must not remove these backslashes from a style sheet since that would change the style sheet's meaning.

Third, backslash escapes allow authors to refer to characters they cannot easily put in a document. In this case, the backslash is followed by at most six hexadecimal digits (0..9A..F), which stand for the ISO 10646 ([ISO10646]) character with that number, which must not be zero. (It is undefined in CSS 2.1 what happens if a style sheet does contain a character with Unicode codepoint zero.) If a character in the range [0-9a-fA-F] follows the hexadecimal number, the end of the number needs to be made clear. There are two ways to do that:

with a space (or other white space character): "\26 B" ("&B"). In this case, user agents should treat a "CR/LF" pair (U+000D/U+000A) as a single white space character. by providing exactly 6 hexadecimal digits: "\000026B" ("&B") In fact, these two methods may be combined. Only one white space character is ignored after a hexadecimal escape. Note that this means that a "real" space after the escape sequence must be doubled.

If the number is outside the range allowed by Unicode (e.g., "\110000" is above the maximum 10FFFF allowed in current Unicode), the UA may replace the escape with the "replacement character" (U+FFFD). If the character is to be displayed, the UA should show a visible symbol, such as a "missing character" glyph (cf. 15.2, point 5).

Note: Backslash escapes are always considered to be part of an identifier or a string (i.e., "\7B" is not punctuation, even though "{" is, and "\32" is allowed at the start of a class name, even though "2" is not). The identifier "te\st" is exactly the same identifier as "test".

4.4.1 Referring to characters not represented in a character encoding(CSS 2.1-W3C-)

4.4.1 Referring to characters not represented in a character encoding

A style sheet may have to refer to characters that cannot be represented in the current character encoding. These characters must be written as escaped references to ISO 10646 characters. These escapes serve the same purpose as numeric character references in HTML or XML documents (see [HTML4], chapters 5 and 25).

The character escape mechanism should be used when only a few characters must be represented this way. If most of a style sheet requires escaping, authors should encode it with a more appropriate encoding (e.g., if the style sheet contains a lot of Greek characters, authors might use "ISO-8859-7" or "UTF-8").

Intermediate processors using a different character encoding may translate these escaped sequences into byte sequences of that encoding. Intermediate processors must not, on the other hand, alter escape sequences that cancel the special meaning of an ASCII character.

Conforming user agents must correctly map to ISO-10646 all characters in any character encodings that they recognize (or they must behave as if they did).

For example, a style sheet transmitted as ISO-8859-1 (Latin-1) cannot contain Greek letters directly: "κουρος" (Greek: "kouros") has to be written as "\3BA\3BF\3C5\3C1\3BF\3C2".

Note. In HTML 4, numeric character references are interpreted in "style" attribute values but not in the content of the STYLE element. Because of this asymmetry, we recommend that authors use the CSS character escape mechanism rather than numeric character references for both the "style" attribute and the STYLE element. For example, we recommend:

<SPAN style="font-family: L\FC beck">...</SPAN>

rather than:

<SPAN style="font-family: L&#252;beck">...</SPAN>

 

 

最後に、DQMに関連する興味深い記事(英語)のリンクも置いておきます。いや~想像以上に奥が深かった~w✨0(:3 )~ ('、3_ヽ)_

This Blog uses M+ Font, Logo Type Gothic, Google Fonts and FontAwesome. It"s free & Designed cool font. Greatful for Koji Morishita of M+ Fonts Designer( Creator) and "フォントな" , Google, Adobe, and Dave Gandy!
Background Photos is downloaded by Unsplash.com. Thanks!
|*´-`)チラッ
Please Click Me
本ブログ全てにおいて、アフィリエイトプログラムには参加しておりません(•ᵕᴗᵕ•) 詳しくはこちらをご覧下さい。また、 以下のバナーは、私が「質が高い・あら素敵(ˊo̶̶̷ᴗo̶̶̷`)✨」と感じたサイト様を掲載させて頂いておます_(( _๑´ω`))_⭐️