Knowledgebase

Developing the perfect CMS / Site Platform

Posted by fx1024, 07-02-2008, 05:36 AM
After many years of PHP coding, I realized that I had to write and rewrite all over again the same functions/features, in order to create a new website. Finally I decided to create a high performance basic/default site which I could use in every new site development. Every new feature in a new site it would be just a new module, that when it's ready, it could be used in any other site using the default code. The reason that I didn't choose an existing CMS is performance. I order a CMS to be flexible enough, most CMS are extremely CPU/MEM intensive. If you are lucky enough to get a succesful site, some time you have either to rewrite the whole site, or to get unbelievable HW to host just a forum. At this point of time the default site has the following basic features: * New member registration with admin defined custom fields, and auto reg form creation. * Session Management, with or without access to DB. * A multi-purpose unlimited categories structure. The same structure can hold ie article categories, and subcategories, forums and subforums, even image libraries * Modules Management. Addition, removal and de/activation of each module. * Module settings, and feature access per module/category/user group * Connection of each module to a category or category branch. * User groups management. Every user can be member of 5 (max) different -admin defined- groups. * Skinable. Every skin can have its own html, php templates and css * Translations management. The site can be multilingual, with each translation automatically hardcoded in the templates files. * Templates in form of php functions * HTML libraries of common HTML element structures * AJAX library (JSON based) * Ability to get semi-static data (ie categories structure) from db, flatfiles, or APC Cache. Existing Modules: Articles module, has the following admin defined by category features: * Multiple translations of each article * Username / Author name * Author bio * Comments * Rate * Related Articles * Keywords * Path * Date, * Email, Print, PDF links Planed Modules: Messages, Instant Messanger, Forums, Chat rooms, Blogs, Image Library, Products, Dating OK pls tell be what other thing would you consider as mandatory in such a system, both as basic feature and/or module, module/feature? Any ideas are most welcome!

Posted by P-nut, 07-02-2008, 09:17 AM
Sounds pretty neat! My biggest thing when using a CMS is the ability to make it look like *mine* -- that is, to be able to totally customize the frontend so someone stumbling on my site doesn't think "Oh, lovely, another *Nuke site". I also prefer to have the ability to have user-friendly URLs - not something like sitename.com/com,content/4/lorem_ipsum/443,3433.html. More like sitename.com/news/fun-category/a-funny-article.html. These are just my personal preferences; YMMV. Good luck with your CMS!

Posted by JLHC, 07-02-2008, 09:29 AM
Great idea. I suggest it to be able to integrate with the current CMSes as well.

Posted by fx1024, 07-02-2008, 09:30 AM
P-Nut It's designed in a way that code outputs just values. The skins / templates are defining the presentation. If you consider that templates can be php code instead of simple html the customization is unlimited! I though the issue of friendly urls, but the only problem is with articles/threads having the same title. Does anyone with SEO experience, knows if: examplesite.com/articles/this_is_an_article_title_12345 is much worse than : examplesite.com/articles/this_is_an_article_title

Posted by D3m0n, 07-02-2008, 09:51 AM
Is it open source? Do you have any sites running it to take a look? Seems pretty cool, however you will need help of other coders too. You can create a new project or something with it so that to become more popular. good luck anyway

Posted by fx1024, 07-02-2008, 11:16 AM
At this point of time I work mainly in the functionality level. I'll upload it as soon as I've got the presentation layer ready. Open source is a consideration, but I don't know if there is any interest. Anyway there are tons of ready made CMSes out there, and this one will be very special (in a bad way)

Posted by fx1024, 07-02-2008, 11:18 AM
Thanks JLHC. Integration with other CMSes is out of our scope.

Posted by P-nut, 07-04-2008, 11:40 AM
Maybe instead of integrating with other CMSs you could have an import script for the major ones like *Nuke,Drupal,Joomla, etc. People will probably be more likely to switch to yours if they don't have to import all their content by hand.

Posted by fx1024, 07-04-2008, 02:34 PM
P-Nut, if I go open-source is this definetely a must!

Posted by Christian, 07-04-2008, 03:04 PM
Even if you go paid, it really should be a must as well. If your CMS is a huge hit, you don't want to just limit your base to only new users.

Posted by Snargleflap, 07-05-2008, 04:09 AM
Yeah no kidding. You just want to make it better, not kill yourself

Posted by Burhan, 07-05-2008, 04:25 AM
Looks like you are just re-creating Drupal. I wish you luck. Do not hard code translations into the template files. This is a bad idea (trust me from tons of experience in translations).

Posted by fx1024, 07-05-2008, 12:52 PM
The options we consider, is to keep it inhouse for our site development or go (some time) open source. I don't think business wise, that there is room in this market for a new product

Posted by fx1024, 07-05-2008, 01:07 PM
Burhan, why do you think so? The hardcoding of the translations, is completely transparent to the user/admin. The system is designed is such a way, that you can add or change any translation, or text without mess with the templates. Simply when you do so, the system recreates the stored templates files. I don't really find a reason to load a ton of variables or constants, every time a load a page. Another interesting I think point, is that there are two kinds of translated texts. First kind can be used inside code (in form of constants) and second one that is actually hardcoded only in the template file, that gonna use it. Another point I just remembered is the template versioning system! You can change any template without lost the changes, you can go back any time and of course you can keep versions log and see the differences of two different versions.

Posted by Burhan, 07-05-2008, 05:57 PM
If you are loading the translations externally (ie, gettext) then you don't need to "hard code" or soft code, or medium code them at all. This is the preferred method that allows flexibility both ways. I assume that the system recreates (by that I mean, parses and then stores in cache) the template on a language change or addition. This could get tiresome if you have a lot of templates or significant language changes. Also, this might affect loading of the right template on a language edit. Surely your system is not compiling the entire template file for the smallest change? Imagine I change "colour" to "color" in one template file (which is about 50 lines), it will then recreate the template again? You can avoid both scenarios (constant and hard coding) by using a known translation method (like gettext and friends). That template changelog is a nice feature.

Posted by fx1024, 07-06-2008, 05:58 AM
Burhan, I appreciate your opinion. Let me describe exactly how the template /translation system works. First of all we have the translations. You can define any number of supported langs, with their encodings. Then you define std LangTexts per module, and then the translation of each LangText for each supported lang. When you write a template you use only the LangText (i.e "{pls_insert_your_username}:". When you save the template, the system goes and replaces any occurence of this LangText with each translation. Then it saves a number of template files equal to the supported langs. For example if the template was stored in the common.tpl.php file, the system will recreate and save /en/common.tpl.php , /gr/common.tpl.php and /it/common.tpl.php if english, greek and italian are defined as system langs. The same happens more or less when you change a translation. Finally the system clears the templates stored in the APC cache. Inside the script itself, each template is a php function, (ie echo template_login_form(); )that can get the used variables directly (it finds vars used and defines them as globals) or passed as parameters. Developer don't need to mess with translations, the system defines the a $langinclude var in order to get each time the correct translation. I used the approach of included php functions files, in order to avoid evals and str_replace methods I've seen around. It seems that performance wise the best way to evaluate a var is inside double quotes. So when I say "recreate" I mean create again and store in the filing system each template file. Developer/Admin has nothing to do with this "recreation", you just change the translation or template you want, if the system has to rewrite everything underneath it's not your problem. The general philosophy I try to use is: Get everything (scripts, templates, static and semi-static data) from APC Cache and frequently changing data from MySql. My first target is to create a CMS that will be able to serve 1M pages per day in an entry level server.

Posted by Burhan, 07-06-2008, 05:21 PM
You should have a look at gettext, which provides this messaging digest in an optimized format which does what you are writing above -- replace tokens with translations. Since its done in C and has PHP binding, it will be a lot faster than your PHP code (even if you include APC). Good stuff

Posted by fx1024, 07-07-2008, 03:11 AM
Burhan, gettext is really interesting, I didn't knew it and I will definetely take a better look at it. From a first look it seems, that even it's written in C, the translation is done, everytime a page is displayed. The way I have design the system, the translation/replacement is done, only when you change something (a translation or template), otherwise there's not such a need. Thank you

Posted by Czaries, 07-09-2008, 10:59 AM
I think you've got a great vision for this project, but I fear you may have already started off on the wrong foot. If you try to compete with other CMS products starting with a feature list, it will be an uphill battle all the way. There are many existing products on the market that have feature lists miles long, and they are pretty much guaranteed to beat any new product in features for years to come. I would strongly advise that you instead to focus on usability, especially for end users who are not tech-savvy. This is where most CMS products on the market fail miserably, and where there is the most opportunity for your success. Build something that is simple and intuitive for most people (much easier said than done), and let features follow second. I would much rather give my client a CMS with 20% of the features that is 100% more usable and intuitive than one with a feature list 4 miles long that will require several hours of my time for training and support.

Posted by WHTer, 07-09-2008, 11:45 AM
You definitely need a way to separate the scripting from the layout and styling.

Posted by fx1024, 07-10-2008, 03:49 PM
Czaries, WHTer Thank you for your participation. Czaries, maybe the thread title is a bit misleading. As I describe at my initial post, my purpose is not the creation of another (competitive) CMS, but the creation of a "default" site that I can use, in order to avoid rewriting every time I need to make a new site. I don't know if it must be called "CMS", "default site" or "framework". If there is interest for such a thing maybe I'll go open-source. (the only major reason I won't is that I'm bored to document it in detail) I agree completely with you opinions about CMS market and the need of a simple CMS, but my focus in this project is simple: 1. Re-usability. I'm thinking even a form that the system would be installed just once, per server, to serve multiple domains/sites. 2. Performance. Because I hate to rewrite/optimize or upgrade the H/W of a server, when a site is going good. (I think that when a site's going well, you must focus on business not H/W & S/W) WHTer This has been done. The system already supports skins (packages of templates, images and CSS) and the optional use of presentation level scripting (php) within the templates.

Posted by Mike - Limestone, 07-10-2008, 04:14 PM
I was thinking Drupal or even Ruby on Rails when I read the original post. Nonetheless, another product out there would be excellent. I would venture to guess that some money could be made, as well, provided that it was easy to use -- especially if it included some top-notch templates. -mike

Posted by Steve_Arm, 07-10-2008, 04:30 PM
With all these features you describing, I don't see a privilege on performance. I would really be interested in seeing some benchmarks. Also you are referring as we, is there a team behind it?

Posted by andren, 07-10-2008, 11:24 PM
No, no, it is Typo3.

Posted by zuborg, 07-11-2008, 06:36 AM
Good luck. And use caching everywhere, otherwise your CMS will eat CPU/RAM as other CMS do. Don't do work of web-servers, it's apache/nginx job to delivery content, purpose of CMS is to manage it. Keep security in mind as well.

Posted by fx1024, 07-11-2008, 01:51 PM
Mike - Limestone I don't have such ambitions I prefer to build sites... than CMSes Steve_Arm Agaphte file... I know that flexibility & performance are two opposite forces in IT nature. But after 5 years of fine-tuning and optimizing large scale php sites, I believe that I can found balance (and inner peace some time LOL) I have done some profiling but the results have some value only when are compared to something else. I'll install Drupal (or any other CMS you can recomment) in order to check it against my system. In my windows pc, the display article page (full featured: article, comments, votes, related articles, author bio etc) takes 40-85ms first time and 4-5ms when semi-cached (header & footer dynamic, page content static) Adren zuborg I just finished the caching sub-system! Caching subsystem uses filing system caching (not APC cache) and works without DB access for cached file TTL. I have a relative question (anyone expert in this field pls help) Which the optimal number of files/directories per directory in a Linux/Unix system, performance wise? Is it better to store 10000 files in a directory, 1000 in 10 directories, or 100 in 100 directories?

Posted by Burhan, 07-12-2008, 08:44 AM
It is better to have more directories than files; however if you don't keep track of your inodes, you might run out of them. Which, in theory -- could lead to a situation where you have disk space, but are unable to create any files (in linux, a folder is just a file) because you don't have inodes. There are limits on the number of subdirectories a directory can have (I believe its 32000 for ext3).

Posted by Xeentech, 07-12-2008, 05:44 PM
Since you're not using a db my method won't fit exactly, but it might inspire you I guess. When a file is inserted into our 'file' table on the db the generated integer ID is returned, and I use this to build the path where it will be stored. Say the returned ID is 12345678, the path will be /.../file/12/34/12345678 This way we'll only ever have 10,000 dirs, and only 100 files per dir.. should be nice and easy for the OS's filesystem to lookup. When asked for a certain file it only has 00-99 in the root dir to look through, and the same in the next. If you only use 3 steps like we have then once you're storing 10 million files the last dir has 100000 files in it. Edit: forgot to mention this too. On one site I installed I figgered it would be efficient to render the returned ID as hex.. 12345678 is then BC614E, then we split that up /B/C/614E. This way the root and first dirs has 16 children each. If you wanted to be sure the files were going to space out evenly about the filesystem you could use the tail end of the number, /78/56/12345678. We've not hit any performance issues keep the files two dirs deep like this, though we've only done the most basic of benchmarks. When we time an app that "stats" the file it takes on average as long to stat a file in our home dir as it does to get a /xx/xx/xxxxxx file. Last edited by Xeentech; 07-12-2008 at 05:56 PM.

Posted by fx1024, 07-13-2008, 03:39 AM
Xeentech, this is almost the way I implemented the cache. The reason I ask about directories/files optimal ratio, was to decide if I would use a 12/34/1234file.cache pattern or a 1/2/3/4/1234file.cache pattern. The hex conversion is really clever! What bothers me, is the inode issue, that Burhan, mentioned. I'm not a linux expert, and I haven't realized this limitation, which in many cases seems achievable. I have designed the cache subsystem, in a way that a file can be found without access to db, using only the module number that created it, an ID, and an extension. This way each module can save its own files and diffent formats of the them (ie article with id=1234 and print version of article w/ id=1234) The TTL of the cache file is written inside the file. When I cached file is asked, the subsystem reads it first line (if file exists) and checks the ttl, if it's ok continues and read the whole file, if not it returns false. With the inode issue, I think I must set a cron job that runs through the cached files, read their first line and decide if it will delete them. (But I don't like this approach...) XeenTech do you keep cached files, when they reach their TTL, or just keep them and update them when necessary

Posted by Xeentech, 07-13-2008, 10:11 AM
With our setup checking that stuff is less of an issue, as the TTL, last access and creation are in the database table. Getting a list of, and then opening each file to check the ttl on the first line sounds like a lot of work. You're going to use an FD for each file you open, could be slow. I'd "stat" each file for the creation time if you can work out the ttl from that, and since you'd be doing that you get the "last access" time from the OS's filesystem for free Perl: $atime,$mtime,$ctime are the access time, modified time and creation time. Our cache system create a new file every time a page is requested that doesn't have a "fresh" cache. Stale cache is then deleted after the page is delivered, any stale cache that is, not just for that page. In some installs this is cleaned up by a cron like you're planning.

Posted by zuborg, 07-15-2008, 08:19 AM
Directory manipulation is much more difficult kernel operation than file operation, due transactional nature. It requires more locking and more disk operations to create/update/delete directory node. So it's good practice to keep number of directories not too high. From the other side, storing all files in same single directory introduce lock-waits on concurrent operation, so it's good practice to store files in separate directories. Directory nest level could be low to decrese number of 'path to inode' lookups. I would recommend to use 1-level or 2-level (for large number of files) directory structure with 100-1000 maximum of elements per directory for example: 1-level 999-max number (for millions of files) dir/000/some_long_file_name.txt dir/001/some_other_file.txt 2-level 256-max number (tens of millions files 256*256 * 1000 = 64M) dir/18/4a/a9ced3dad556814ed46042de696e184a.md5

Posted by fx1024, 07-17-2008, 03:49 AM
Xeentech "stat" it's a good idea Thanks Zuborg, this is exactly what I was looking for! Thanks (it's weird but I couldn't find anything relative at Google...)

Posted by D3m0n, 07-17-2008, 05:08 PM
Any release date? Any site that we can look at for your layout and CMS ? Thx

Posted by fx1024, 07-20-2008, 03:41 AM
D3m0n I plan a first site to be on-line mid-Sep

Posted by D3m0n, 07-22-2008, 08:27 AM
Oh Great, Keep us informed about that



Was this answer helpful?

Add to Favourites Add to Favourites

Print this Article Print this Article

Also Read
Hostdime/Dimenoc (Views: 634)
Steadfast down? (Views: 577)


Language:

Client Login

Email

Password

Remember Me

Search