⇒githubに妄想を書いている_20200914

2020/01/22：場所だけ作成
2020/02/04：XMLテンプレートを引っ張ってきた。これ実名と会社名が漏れるじゃねぇか。

2022/5/21

【再々考】テキストデータフォーマット考える

2020/09/14：もはやgithubに書いてる。

自分でなんとなく考えたテキストフォーマットについて、githubに綴っている。はじめは日本語で書きゃよかったんだけど、なんとなく今は英語でガリガリ書いている。

https://github.com/senzry/sen

まだ途中なんだけど、こういうのってどういうタイミングで世間に周知しに行けば良いんだろう。というのと、issueとかほしい。

以下、残してあるだけのメモ。

【再考】テキストデータフォーマット考える

めも：アノテーション。処理系に伝える注釈の話。例えばファイルストリームで読み込みたい時に、「一つのフィールドだけ持つデータである」という事を明示できれば優しい。multi。

めも：外部キー制約の書き方はもうちょっと考えた方が良い気がした。複合キーを許すかどうか。「reference」「ref」的な語彙を使った方が良いんじゃないかな。あと、「データに配列が含まれる」と「データがrefを持つ」が混在すると、書き方から一貫性が失われるぜ。同じデータ構造を表現するのに二通りの書き方が発生する。つまり、JSON的に配列を持つのか、RDBみたいに「ぶら下がる子が親のIDを持つことで複数紐づけする」のかっていう。

めも：バイナリ。「以下バイナリ」というマーキングで飛ばせないか。あるいは、長さを一緒に受け取ってbyte arrayとして受け取れないか。でもそれってセキュリティやばそう。気のせいだろうか。とりあえず拡張子は別に用意しなければならんだろう。ヘッダーがリッチなバイナリデータってだけな気がしてきたが。

めも：このPDF書いた藤原和典なる人と知り合えねぇかな。酒を奢りたいんだが。

めも20200703：ファイルシステムのことを色々考えていたら、CSVって追記可能なのが熱いなって思った。1レコード追加するだけのケースで全テキストを書き換えねばならんのは馬鹿らしい。だから…うん。ディレクトリ階層のルールを定義しようかなと思った。さらに言えば定義とデータをファイル単位で分離できちゃうしね。各データ間の参照ルールを決めたい。無論単一ファイルでも良いんだが。
例えばフォルダ自体に「.senvelope」的なサフィックスをつけてよぉ。「.sendex」みたいなファイル配置させてさぁ。indexが配置されているディレクトリの同階層に「books.records」的なフォルダ切って、ファイル名をguidでも連携システム名でも日付でもなんでも良いけどそんな感じにすりゃええやん。pkgに含まれるすべてのファイルは二重圧縮してはならない（なんらかの解凍をした結果はプレーンテキストでなければならない）が勝手に階層を作っても良いとかにする。books.records内は再帰的にディレクトリ掘られて全部のレコードを読まれる。レコードセットがファイル分割されて入るわけだから、そのメタ情報も書いておけるように。

めも：アテンション。ファイル先頭7文字が「//”sen!”」的な感じだったとき、テキストエディタはsenとして解釈しても良い。しなくても良い。というのと、バージョン指定は必要だろうと思う。sen v1的な。無いと思いたいがないとも限らんのが破壊的変更。標準の文字集合やら文字コードが変わったりね。「//”」まではasciiだし、絵文字混ぜちゃえば良いんじゃねぇかな。🍙的な。UTF-8なら「0xF0 0x9F 0x8D 0x99」でUTF-16だったら「0xD83C 0xDF59」だし。SJISは知らん。

めも：スキーマバージョニング。データ構造って運用してたら変わるわけだから、そのバージョニングができないといけないでしょう。たとえば「Book#1.0.0」的な感じでバージョニングする。間に空白入ってもヨシ。スキーマのバージョン変更をクライアントに通知する術がねぇなぁと思ったけど。metaにReadMeとかあればいいわけ？

めも：CSV的に、ファイルの行末にレコードを追加できる仕様を考えていた。んで、プリミティブデータのコンフィグ（メタデータ）が必要だと思った。つまり、integerが何ビットであるか。floatが何ビットであるか。時刻オフセットがいくつであるか。定義された標準メタと、ユーザーが定義するカスタムメタを作らねばならんなと思った。ペイロード（本データ）としてメタを扱いたくないから。あと、関係ねぇけどNaNとinfinityと指数が必要だとも思った。
そいで、レコードの集合を表現するデータを「sen」の仕様と分離せにゃ分かりづらい。「列挙」の「Enumerated Data」から「senum」っていう拡張子で良いんじゃないかと思う。
フォルダに付与するマークとしてのsenvelope。データ参照を扱えるsendex。単一で完結するsen。任意の型（あるいは同じインターフェースを持つ型？）で構成される行データを扱うsenum。

めも：immutableは概ねwebやら裏処理で使う（サーバーからimmutable付きでフロントに提供し、ユーザーが参照する。）んだから、そういう名前空間が必要と思う。RFCってユースケース書いて良いんだろうか。まぁ書かなくてもいいか。

めも：structのidはやっぱり複数項目に跨がらない。というのも、KVSに乗せるときのKeyに対応するものがidなんじゃないだろか。んで、unique keyが複数項目にまたがる。で、structっていうかtypeだと思った。構造体じゃなくてもう型でいいです。enumはsuit（スーツ）って呼び方が良いかなと思う。トランプのスーツのスーツ。code（文字列）を持てるようにする。

めも：OpenAPIってステータスコードをプロパティ名にしておるらしい。ふざけんなクソが。responsesってどう考えても列挙だろうが。なんで配列じゃないんだよ。よく見たらオメー「/pets」とか「application/json」ってプロパティ名もある。どうなってんだ。流石に「/pets」とかユーザーが定義できるプロパティ名はありえんだろ。どうやってコードに落としてくるんだ。

めも：ここでInternet Draftのxmlテンプレートをゲット。（draft-davies-template-bare-07.xml）
JSONに倣いながら、とりあえず日本語で書いてくことにする。

<?xml version='1.0' encoding='utf-8'?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629-xhtml.ent">
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
    please see http://xml.resource.org/authoring/README.html. -->
<rfc
      xmlns:xi="http://www.w3.org/2001/XInclude"
      category="info"
      docName="draft-HOGEHOGE-sen-00"
      ipr="trust200902"
      obsoletes=""
      updates=""
      submissionType="IETF"
      xml:lang="en"
      tocInclude="true"
      tocDepth="4"
      symRefs="true"
      sortRefs="true"
      version="3">
  <!-- xml2rfc v2v3 conversion 2.38.1 -->
  <!-- category values: std, bcp, info, exp, and historic
    ipr values: trust200902, noModificationTrust200902, noDerivativesTrust200902,
       or pre5378Trust200902
    you can add the attributes updates="NNNN" and obsoletes="NNNN" 
    they will automatically be output with "(if approved)" -->

 <!-- ***** FRONT MATTER ***** -->

 <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
        full title is longer than 39 characters -->

   <title abbrev="Abbreviated Title">Notation to replace JSON and CSV and YAML and more</title>
    <seriesInfo name="Internet-Draft" value="draft-HOGEHOGE-sen-00"/>
    <!-- add 'role="editor"' below for the editors if appropriate -->

   <!-- Another author who claims to be an editor -->

   <author fullname="HOGE HOGE" initials="H.H." role="editor" surname="HOGE">
      <organization>HOGE</organization>
      <address>
        <postal>
          <street/>
          <!-- Reorder these if your country does things differently -->

         <city>Soham</city>
          <region/>
          <code/>
          <country>UK</country>
        </postal>
        <phone>+44 7889 488 335</phone>
        <email>[email protected]</email>
        <!-- uri and facsimile elements may also be added -->
     </address>
    </author>
    <date year="2020"/>
    <!-- If the month and year are both specified and are the current ones, xml2rfc will fill 
        in the current day for you. If only the current year is specified, xml2rfc will fill 
   in the current day and month for you. If the year is not the current one, it is 
   necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the 
   purpose of calculating the expiry date).  With drafts it is normally sufficient to 
   specify just the year. -->

   <!-- Meta-data Declarations -->

   <area>General</area>
    <workgroup>Network Working Group</workgroup>
    <!-- WG name at the upperleft corner of the doc,
        IETF is fine for individual submissions.  
   If this element is not present, the default is "Network Working Group",
        which is used by the RFC Editor as a nod to the history of the IETF. -->

   <keyword>HOGE</keyword>
    <!-- Keywords will be incorporated into HTML output
        files in a meta tag but they have no effect on text or nroff
        output. If you submit your draft to the RFC Editor, the
        keywords will be used for the search engine. -->

   <abstract>
      <t>とりあえず日本語でかく。以下未着手</t>
    </abstract>





  </front>
  <middle>
    <section numbered="true" toc="default">
      <name>Introduction</name>
      <t>The original specification of xml2rfc format is in <xref target="RFC2629" format="default">RFC 2629</xref>.</t>
      <section numbered="true" toc="default">
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
       "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
       document are to be interpreted as described in <xref target="RFC2119" format="default">RFC 2119</xref>.</t>
      </section>
    </section>
    <section anchor="simple_list" numbered="true" toc="default">
      <name>Simple List</name>
      <t>List styles: 'empty', 'symbols', 'letters', 'numbers', 'hanging',
     'format'.</t>
      <ul spacing="normal">
        <li>First bullet</li>
        <li>Second bullet</li>
      </ul>
      <t> You can write text here as well.</t>
    </section>
    <section numbered="true" toc="default">
      <name>Figures</name>
      <t>Figures should not exceed 69 characters wide to allow for the indent
     of sections.</t>
      <t>Preamble text - can be omitted or empty.</t>
      <figure anchor="xml_happy">
        <artwork align="left" name="" type="" alt=""><![CDATA[
+-----------------------+
| Use XML, be Happy :-) |
|_______________________|
           ]]></artwork>
      </figure>
      <t>Cross-references allowed in pre- and postamble. <xref target="min_ref" format="default"/>.</t>
      <t>The CDATA means you don't need to escape meta-characters (especially
     < (&lt;) and & (&amp;)) but is not essential.
     Figures may also have a title attribute but it won't be displayed unless
     there is also an anchor. White space, both horizontal and vertical, is
     significant in figures even if you don't use CDATA.</t>
    </section>
    <!-- This PI places the pagebreak correctly (before the section title) in the text output. -->

   <section numbered="true" toc="default">
      <name>Subsections and Tables</name>
      <section numbered="true" toc="default">
        <name>A Subsection</name>
        <t>By default 3 levels of nesting show in table of contents but that
       can be adjusted with the value of the "tocdepth" processing
       instruction.</t>
      </section>
      <section numbered="true" toc="default">
        <name>Tables</name>
        <t>.. are very similar to figures:</t>
        <t>Tables use ttcol to define column headers and widths.
         Every cell then has a "c" element for its content.</t>
        <table anchor="table_example" align="center">
          <name>A Very Simple Table</name>
          <thead>
            <tr>
              <th align="center">ttcol #1</th>
              <th align="center">ttcol #2</th>
            </tr>
          </thead>
          <tbody>
            <tr>
              <td align="center">c #1</td>
              <td align="center">c #2</td>
            </tr>
            <tr>
              <td align="center">c #3</td>
              <td align="center">c #4</td>
            </tr>
            <tr>
              <td align="center">c #5</td>
              <td align="center">c #6</td>
            </tr>
          </tbody>
        </table>
        <t>which is a very simple example.</t>
      </section>
    </section>
    <section anchor="nested_lists" numbered="true" toc="default">
      <name>More about Lists</name>
      <t>Lists with 'hanging labels': the list item is indented the amount of
     the hangIndent: </t>
      <dl newline="true" spacing="normal" indent="8">
        <dt>short</dt>
        <dd>With a label shorter than the hangIndent.</dd>
        <dt>fantastically long label</dt>
        <dd>With a label longer than the
         hangIndent.</dd>
        <dt>vspace_trick</dt>
        <dd>Forces the new
         item to start on a new line.</dd>
      </dl>
      <!-- It would be nice to see the next piece (12 lines) all on one page. -->

     <t>Simulating more than one paragraph in a list item using
     <vspace>: </t>
      <ol spacing="normal" type="a">
        <li>First, a short item.</li>
        <li>
          <t>Second, a longer list item.</t>
          <t> And
         something that looks like a separate pararaph..</t>
        </li>
      </ol>
      <t>Simple indented paragraph using the "empty" style: </t>
      <ul empty="true" spacing="normal">
        <li>The quick, brown fox jumped over the lazy dog and lived to fool
         many another hunter in the great wood in the west.</li>
      </ul>
      <section numbered="true" toc="default">
        <name>Numbering Lists across Lists and Sections</name>
        <t>Numbering items continuously although they are in separate
       <list> elements, maybe in separate sections using the "format"
       style and a "counter" variable.</t>
        <t>First list: </t>
        <ol group="reqs" spacing="normal" type="R%d">
          <li>#1</li>
          <li>#2</li>
          <li>#3</li>
        </ol>
        <t> Specify the indent explicitly so that all the items line up
       nicely.</t>
        <t>Second list: </t>
        <ol group="reqs" spacing="normal" type="R%d">
          <li>#4</li>
          <li>#5</li>
          <li>#6</li>
        </ol>
      </section>
      <section numbered="true" toc="default">
        <name>Where the List Numbering Continues</name>
        <t>List continues here.</t>
        <t>Third list: </t>
        <ol group="reqs" spacing="normal" type="R%d">
          <li>#7</li>
          <li>#8</li>
          <li>#9</li>
          <li>#10</li>
        </ol>
        <t> The end of the list.</t>
      </section>
    </section>
    <section anchor="codeExample" numbered="true" toc="default">
      <name>Example of Code or MIB Module To Be Extracted</name>
      <t>The <artwork> element has a number of extra attributes
       that can be used to substitute a more aesthetically pleasing rendition
       into HTML output while continuing to use the ASCII art version in the
       text and nroff outputs (see the xml2rfc README for details). It also
       has a "type" attribute. This is currently ignored except in the case
       'type="abnf"'. In this case the "artwork" is expected to contain a
       piece of valid Augmented Backus-Naur Format (ABNF) grammar. This will
       be syntax checked by xml2rfc and any errors will cause a fatal error
       if the "strict" processing instruction is set to "yes". The ABNF will
       also be colorized in HTML output to highlight the syntactic
       components. Checking of additional "types" may be provided in future
       versions of xml2rfc.</t>
      <artwork name="" type="" align="left" alt=""><![CDATA[

/**** an example C program */

#include <stdio.h>

void
main(int argc, char *argv[])
{
   int i;

   printf("program arguments are:\n");
   for (i = 0; i < argc; i++) {
       printf("%d: \"%s\"\n", i, argv[i]);
   }

   exit(0);
} /* main */

/* end of file */

           ]]></artwork>
    </section>
    <section anchor="Acknowledgements" numbered="true" toc="default">
      <name>Acknowledgements</name>
      <t>This template was derived from an initial version written by Pekka
     Savola and contributed by him to the xml2rfc project.</t>
      <t>This document is part of a plan to make xml2rfc indispensable <xref target="DOMINATION" format="default"/>.</t>
    </section>
    <!-- Possibly a 'Contributors' section ... -->

   <section anchor="IANA" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>This memo includes no request to IANA.</t>
      <t>All drafts are required to have an IANA considerations section (see
     <xref target="RFC5226" format="default">Guidelines for Writing an IANA Considerations Section in RFCs</xref> for a guide). If the draft does not require IANA to do
     anything, the section contains an explicit statement that this is the
     case (as above). If there are no requirements for IANA, the section will
     be removed during conversion into an RFC by the RFC Editor.</t>
    </section>
    <section anchor="Security" numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>All drafts are required to have a security considerations section.
     See <xref target="RFC3552" format="default">RFC 3552</xref> for a guide.</t>
    </section>
  </middle>
  <!--  *****BACK MATTER ***** -->

 <back>
    <!-- References split into informative and normative -->

   <!-- There are 2 ways to insert reference entries from the citation libraries:
    1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
    2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
       (for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")

    Both are cited textually in the same manner: by using xref elements.
    If you use the PI option, xml2rfc will, by default, try to find included files in the same
    directory as the including file. You can also define the XML_LIBRARY environment variable
    with a value containing a set of directories to search.  These can be either in the local
    filing system or remote ones accessed by http (http://domain/dir/... ).-->

   <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->
     <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <seriesInfo name="DOI" value="10.17487/RFC2119"/>
            <seriesInfo name="RFC" value="2119"/>
            <seriesInfo name="BCP" value="14"/>
            <author initials="S." surname="Bradner" fullname="S. Bradner">
              <organization/>
            </author>
            <date year="1997" month="March"/>
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification.  These words are often capitalized. This document defines these words as they should be interpreted in IETF documents.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
        </reference>
        <reference anchor="min_ref">
          <!-- the following is the minimum to make xml2rfc happy -->

       <front>
            <title>Minimal Reference</title>
            <author initials="authInitials" surname="authSurName">
              <organization/>
            </author>
            <date year="2006"/>
          </front>
        </reference>
      </references>
      <references>
        <name>Informative References</name>
        <!-- Here we use entities that we defined at the beginning. -->

     <reference anchor="RFC2629" target="https://www.rfc-editor.org/info/rfc2629" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2629.xml">
          <front>
            <title>Writing I-Ds and RFCs using XML</title>
            <seriesInfo name="DOI" value="10.17487/RFC2629"/>
            <seriesInfo name="RFC" value="2629"/>
            <author initials="M." surname="Rose" fullname="M. Rose">
              <organization/>
            </author>
            <date year="1999" month="June"/>
            <abstract>
              <t>This memo presents a technique for using XML (Extensible Markup Language) as a source format for documents in the Internet-Drafts (I-Ds) and Request for Comments (RFC) series.  This memo provides information for the Internet community.</t>
            </abstract>
          </front>
        </reference>
        <reference anchor="RFC3552" target="https://www.rfc-editor.org/info/rfc3552" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3552.xml">
          <front>
            <title>Guidelines for Writing RFC Text on Security Considerations</title>
            <seriesInfo name="DOI" value="10.17487/RFC3552"/>
            <seriesInfo name="RFC" value="3552"/>
            <seriesInfo name="BCP" value="72"/>
            <author initials="E." surname="Rescorla" fullname="E. Rescorla">
              <organization/>
            </author>
            <author initials="B." surname="Korver" fullname="B. Korver">
              <organization/>
            </author>
            <date year="2003" month="July"/>
            <abstract>
              <t>All RFCs are required to have a Security Considerations section. Historically, such sections have been relatively weak.  This document provides guidelines to RFC authors on how to write a good Security Considerations section.   This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
        </reference>
        <reference anchor="RFC5226" target="https://www.rfc-editor.org/info/rfc5226" xml:base="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.5226.xml">
          <front>
            <title>Guidelines for Writing an IANA Considerations Section in RFCs</title>
            <seriesInfo name="DOI" value="10.17487/RFC5226"/>
            <seriesInfo name="RFC" value="5226"/>
            <author initials="T." surname="Narten" fullname="T. Narten">
              <organization/>
            </author>
            <author initials="H." surname="Alvestrand" fullname="H. Alvestrand">
              <organization/>
            </author>
            <date year="2008" month="May"/>
            <abstract>
              <t>Many protocols make use of identifiers consisting of constants and other well-known values.  Even after a protocol has been defined and deployment has begun, new values may need to be assigned (e.g., for a new option type in DHCP, or a new encryption or authentication transform for IPsec).  To ensure that such quantities have consistent values and interpretations across all implementations, their assignment must be administered by a central authority.  For IETF protocols, that role is provided by the Internet Assigned Numbers Authority (IANA).</t>
              <t>In order for IANA to manage a given namespace prudently, it needs guidelines describing the conditions under which new values can be assigned or when modifications to existing values can be made.  If IANA is expected to play a role in the management of a namespace, IANA must be given clear and concise instructions describing that role.  This document discusses issues that should be considered in formulating a policy for assigning values to a namespace and provides guidelines for authors on the specific text that must be included in documents that place demands on IANA.</t>
              <t>This document obsoletes RFC 2434.  This document specifies an Internet Best  Current Practices for the Internet Community, and requests discussion and  suggestions for improvements.</t>
            </abstract>
          </front>
        </reference>
        <!-- A reference written by by an organization not a person. -->

     <reference anchor="DOMINATION" target="http://www.example.com/dominator.html">
          <front>
            <title>Ultimate Plan for Taking Over the World</title>
            <author>
              <organization>Mad Dominators, Inc.</organization>
            </author>
            <date year="1984"/>
          </front>
        </reference>
      </references>
    </references>
    <section anchor="app-additional" numbered="true" toc="default">
      <name>Additional Stuff</name>
      <t>This becomes an Appendix.</t>
    </section>
    <!-- Change Log

v00 2006-03-15  EBD   Initial version

v01 2006-04-03  EBD   Moved PI location back to position 1 -
                     v3.1 of XMLmind is better with them at this location.
v02 2007-03-07  AH    removed extraneous nested_list attribute,
                     other minor corrections
v03 2007-03-09  EBD   Added comments on null IANA sections and fixed heading capitalization.
                     Modified comments around figure to reflect non-implementation of
                     figure indent control.  Put in reference using anchor="DOMINATION".
                     Fixed up the date specification comments to reflect current truth.
v04 2007-03-09 AH     Major changes: shortened discussion of PIs,
                     added discussion of rfc include.
v05 2007-03-10 EBD    Added preamble to C program example to tell about ABNF and alternative 
                     images. Removed meta-characters from comments (causes problems).

v06 2010-04-01 TT     Changed ipr attribute values to latest ones. Changed date to
                     year only, to be consistent with the comments. Updated the 
                     IANA guidelines reference from the I-D to the finished RFC.
v07 2020-01-21 HL    Converted the template to use XML schema version 3.
    -->
 </back>
</rfc>